Current version: 2.2
Referrer Karma is a rather simple script that prevents malicious bots from accessing your pages, flooding your logs and possibly draining your server’s bandwidth. All it does is check that an incoming bot has a valid referrer field URL (i.e. that the page it claims to come from, does exist and does have a link to your site). If RK thinks the incoming visitor is a malicious bot, it displays a 403 error page (which will not be counted as a visit by log analyzer tools) and uses HTML redirecting to the original URL to avoid blocking legitimate visitors (See below for details).
RK keeps a detailed log of referrer URLs that have been blocked, approved or skipped, as well as a short-life black/whitelist of previously examined referrers. It is designed to take the least amount of time possible when examining an incoming referrer. The interface lets you manually whitelist or blacklist a particular URL, as well as add important domains to a static whitelist file.
Recent versions also interface with WordPress’ anti-comment-spam plugin Spam Karma to block spambots before they even reach your comment page (note that you do not need to be running either SK or WP in order to use RK: this feature is entirely optional).
You must of course be running PHP.
You also need to have
URL fopen or CURL enabled on your install of PHP (most servers have at least one of either, but you might want to check with your host).
Please read carefully the details and warnings below. Only install if you are comfortable with a certain level of risk (nothing I consider a show-stopper, personally). Overall, I just cannot afford to do personal support for everybody, since I should not even be spending a second developing this at this point in my life. Which doesn’t mean I won’t listen to suggestions or bug reports (particularly bug report with a code fix: these are much appreciated). But as a rule, consider yourself on your own with this plugin.
Install is really easy… provided you follow carefully these instructions. There are no traps and nothing that anybody with basic knowledge of the web cannot do.
- Drop all the files (referrer-karma.php, rk_settings_sample.php, whitewords.txt and whitelist.txt) anywhere on your web server. If you are a WP user, I would recommend copying them into wp-content
- Edit rk_settings_sample.php and fill in the necessary values (there are comments to guide you inside the file). Basically, the only required values are MySQL connection variables. It is recommended to set a password value too. Ensure that
$can_configureis set to
true(it sets Referrer Karma into “setup” mode). Save and rename the file to rk_settings.php
- Browse to
referrer-karma.php?ref-karma-setup=true(after prefixing the correct path to the file on your server, of course). And make sure all the checks are successful.
- If the setup is successful: edit the rk_settings.php file again and switch off setup mode by setting $can_configure to ‘false’. This step is essential and any attempt to use RK will fail if the variable is not set to false.
- Open the main PHP file of the page your want to protect: this can be any file that spammers are likely to hit often (e.g. your blog’s ‘index.php’ file) and copy-paste the lines you were given on the setup screen (“include” etc). To be effective, the call to RK should be inserted at the very top of the very first file ever used by the pages you are protecting (a common header file or the top of a template), before any HTML or PHP.
- Sit back
- Enjoy the show
Displaying RK stats
A few very quick things
- You can manually add whitelist domains to the file whitelist.txt (just add a new line with your friend’s domain)
- You can do the same with keywords (used on the whole referrer string) in whitewords.txt
- Whitelist and whitewords supersede blacklist, so even if an entry is blacklisted, whitelisting it will make it go through.
- Don’t hesitate to reset the list any time: it isn’t very costly to build, more like a “cache”.
- List entries automatically expire (if not used) after 10 days (you can set the number of days in the settings file).
- Use the password feature (set a password in the config file and append
&pwd=[your password]to the setup URL) in order to access the setup screen when you are out of setup mode (i.e. when you have changed the value of $can_configure back to false).
- At the bottom of the setup screen, you will see links to: 1) Reset the tables (whitelist, blacklist and logs) 2) See logs 3) See current entries in the white/black list 4) See logs without 403′s (allow you to see at a glimpse recent additions to the black and white lists, without the hundred following spambot attempts).
- If you enable Spam Karma compatibility in the settings file, RK will automatically use SK’s IP blacklist to block spammers at the door. This is a no-risk block though, as visitors will still be given a “click through” link to unban themselves and browse your site.
- Recent versions of RK also allow you to set a regex to be used on the referrer’s page content to check for validity (for example, you could set it to approve any referrer that contains your page’s name in order to lower the risk of false positive).
How RK works
Here is, in a nutshell and with some simplification, how Referrer Karma decides whether to allow a referred inbound link or not. The steps order is important (i.e. if a test is conclusive, the script doesn’t go any further):
- If there’s no referrer or if it’s from the same domain: OK
- If the referrer’s domain is matched by an element of whitelist.txt: OK
- If the full referrer is matched by an element of whitewords.txt: OK
- If the referrer’s IP matches an ip_ban value (means this IP has been known to access many times through a bad referrer) in the blacklist table: 403
- If the referrer’s domain matches a white entry in the table: OK
- If the referrer’s domain matches a black entry in the table: 403 plus ban IP after a certain number of attempts
- If the referrer’s domain is not in the table, then RK parses the referrer’s source page and:
- If the source contains the target domain (yours): OK and added as white entry in the table
- If it does not: 403 and added as black entry in the table.
- If the referrer’s domain is not reachable or does not appear to be a proper URL: OK but the function returns false (basically, you can decide to be extra paranoid and refuse the connection when check_referrer() returns false).
OK means that the page is displayed absolutely normally (the user will never know he’s been screened).
403 means the user receives a “403: Access Forbidden” error, with a notice informing him that he has been detected as potential Referrer Spam. The user is not barred altogether from viewing the page (only from this referrer): he is provided with a special link on the error page that will redirect him to the page he was originally coming for.
There can and will be a few false positives. Possibly a few bad whitelisting (e.g. a spammer getting whitelisted through a trick of some kind, although this is unlikely). And more likely a few bad blacklisting: if for some reason the referrer page is not publicly accessible (e.g. a webmail server or such). The latter is why there is a default list of whitelisted domains that contains most search engines and other domain susceptible to appear in a referrer without containing your URL.
I am in the process of building a semi-exhaustive list of webmail domains to be whitelisted by default, as there is no other way to tell these apart from a bad referrer. Feel free to send me your own suggestions.
Optional: Extended features
There are basically two extended filtering features you can use with RK:
1. SK integration
The following setting only apply if you have Spam Karma installed and running. If you don’t, there’s no need to worry about it: RK will still do its job. But having RK take advantage of SK’s blacklist (and vice versa) helps minimizing CPU and bandwidth costs on your blog.
Open your rk_settings.php file and replace the following two lines:
$use_SK_blacklist = false;
$secret_blacklist_string = "rumplestiltskin"; // CHANGE that value if you use the SK blacklist: pick any word you
$use_SK_blacklist = true;
$secret_blacklist_string = "[some random word you pick]";
The first variable tells RK to use SK’s list (make sure it’s installed!). The second one is a sort of “fingerprint” (you don’t need to remember it, just fill in anything you want) that is used to allow banned IP to auto-unban themselves.
2. .htaccess Blacklisting
This one should probably be only used if you have some experience tweaking your .htaccess file.
Basically, if you forward or mod_rewrite a URL toward:
referrer-karma.php?rk_redirect_to=[some URL]&rk_ban_this_ip=1, the client IP will automatically be blacklisted in SK and will receive a 403.
If you provide a redirection URL for the rk_redirect_to param (for example, the original URL), RK will offer to lift the ban and redirect the user, upon simply clicking a link. If you leave that parameter empty (nothing after ‘=’), the user will only get a rather dry “get lost” message… so make sure you only do that for absolutely unmistakable spam.
For example, I have added one simple rule to my .htaccess file:
RewriteRule ^cgi-bin/MT/.* http://unknowngenius.com/blog/wp-content/referrer-karma.php?rk_redirect_to=&rk_ban_this_ip=1 [NC,L]
This rule insures that any spambot randomly trying to locate a Movable Type script (e.g. mt-comments.cgi) will end up in my permanent banlist: not only will they receive a 403, but the next time they try accessing any other areas of my blog, they’ll still receive a 403.
Feel free to use this rule (after replacing with your own paths, of course). It ought to work equally well, adapted for any other URLs that are not supposed to be queried by a legitimate user. I believe it would even be possible to put together a more complex set of mod_rewrite rules to redirect spambots that try to access your wp-comment.php file directly without a proper referrer (or without having queried anything else on your server). But be careful if you get into these waters: you could easily break your comments.
2.1, 2.2: Fixed bugs (SK2 stuff).
2.0: Added SK2 inter-operability. Can now check for a custom regex (instead of only domain name) in referring page’s content.
1.7: Removed RK IP blacklisting (still using SK’s IP banlist, however), as it was both redundant and source of some annoying recursion bug… Shouldn’t matter too much. Complete facelifting for logs and lists, courtesy of Jeff Minard, who might be involved in RK’s future development…
1.6: fixed bugs introduced by 1.5.
1.5: Changed treatment of unreachable URLs. See comment #36 for details.
If you are running anything older, you must upgrade! Nearly every previous versions contained major bugs that have been fixed ever since.
Doc in progress… contact me if you have any question