Current version: 2.2

Overview

Referrer Karma is a rather simple script that prevents malicious bots from accessing your pages, flooding your logs and possibly draining your server’s bandwidth. All it does is check that an incoming bot has a valid referrer field URL (i.e. that the page it claims to come from, does exist and does have a link to your site). If RK thinks the incoming visitor is a malicious bot, it displays a 403 error page (which will not be counted as a visit by log analyzer tools) and uses HTML redirecting to the original URL to avoid blocking legitimate visitors (See below for details).

RK keeps a detailed log of referrer URLs that have been blocked, approved or skipped, as well as a short-life black/whitelist of previously examined referrers. It is designed to take the least amount of time possible when examining an incoming referrer. The interface lets you manually whitelist or blacklist a particular URL, as well as add important domains to a static whitelist file.

Recent versions also interface with WordPress’ anti-comment-spam plugin Spam Karma to block spambots before they even reach your comment page (note that you do not need to be running either SK or WP in order to use RK: this feature is entirely optional).

Requirements

You must of course be running PHP.
You also need to have URL fopen or CURL enabled on your install of PHP (most servers have at least one of either, but you might want to check with your host).

Download

http://wp-plugins.net/downloads/ref-karma.zip

Disclaimer

Please read carefully the details and warnings below. Only install if you are comfortable with a certain level of risk (nothing I consider a show-stopper, personally). Overall, I just cannot afford to do personal support for everybody, since I should not even be spending a second developing this at this point in my life. Which doesn’t mean I won’t listen to suggestions or bug reports (particularly bug report with a code fix: these are much appreciated). But as a rule, consider yourself on your own with this plugin.

Instructions

Install is really easy… provided you follow carefully these instructions. There are no traps and nothing that anybody with basic knowledge of the web cannot do.

  1. Drop all the files (referrer-karma.php, rk_settings_sample.php, whitewords.txt and whitelist.txt) anywhere on your web server. If you are a WP user, I would recommend copying them into wp-content
  2. Edit rk_settings_sample.php and fill in the necessary values (there are comments to guide you inside the file). Basically, the only required values are MySQL connection variables. It is recommended to set a password value too. Ensure that $can_configure is set to true (it sets Referrer Karma into “setup” mode). Save and rename the file to rk_settings.php
  3. Browse to referrer-karma.php?ref-karma-setup=true (after prefixing the correct path to the file on your server, of course). And make sure all the checks are successful.
  4. If the setup is successful: edit the rk_settings.php file again and switch off setup mode by setting $can_configure to ‘false’. This step is essential and any attempt to use RK will fail if the variable is not set to false.
  5. Open the main PHP file of the page your want to protect: this can be any file that spammers are likely to hit often (e.g. your blog’s ‘index.php’ file) and copy-paste the lines you were given on the setup screen (“include” etc). To be effective, the call to RK should be inserted at the very top of the very first file ever used by the pages you are protecting (a common header file or the top of a template), before any HTML or PHP.
  6. Sit back
  7. Enjoy the show

Displaying RK stats

Matt put together this very cool little plugin for WordPress that will automatically insert RK stats in your footer (the same way SK does). You still need to install RK separately.

A few very quick things

  • You can manually add whitelist domains to the file whitelist.txt (just add a new line with your friend’s domain)
  • You can do the same with keywords (used on the whole referrer string) in whitewords.txt
  • Whitelist and whitewords supersede blacklist, so even if an entry is blacklisted, whitelisting it will make it go through.
  • Don’t hesitate to reset the list any time: it isn’t very costly to build, more like a “cache”.
  • List entries automatically expire (if not used) after 10 days (you can set the number of days in the settings file).
  • Use the password feature (set a password in the config file and append &pwd=[your password] to the setup URL) in order to access the setup screen when you are out of setup mode (i.e. when you have changed the value of $can_configure back to false).
  • At the bottom of the setup screen, you will see links to: 1) Reset the tables (whitelist, blacklist and logs) 2) See logs 3) See current entries in the white/black list 4) See logs without 403′s (allow you to see at a glimpse recent additions to the black and white lists, without the hundred following spambot attempts).
  • If you enable Spam Karma compatibility in the settings file, RK will automatically use SK’s IP blacklist to block spammers at the door. This is a no-risk block though, as visitors will still be given a “click through” link to unban themselves and browse your site.
  • Recent versions of RK also allow you to set a regex to be used on the referrer’s page content to check for validity (for example, you could set it to approve any referrer that contains your page’s name in order to lower the risk of false positive).

How RK works

Here is, in a nutshell and with some simplification, how Referrer Karma decides whether to allow a referred inbound link or not. The steps order is important (i.e. if a test is conclusive, the script doesn’t go any further):

  • If there’s no referrer or if it’s from the same domain: OK
  • If the referrer’s domain is matched by an element of whitelist.txt: OK
  • If the full referrer is matched by an element of whitewords.txt: OK
  • If the referrer’s IP matches an ip_ban value (means this IP has been known to access many times through a bad referrer) in the blacklist table: 403
  • If the referrer’s domain matches a white entry in the table: OK
  • If the referrer’s domain matches a black entry in the table: 403 plus ban IP after a certain number of attempts
  • If the referrer’s domain is not in the table, then RK parses the referrer’s source page and:
    • If the source contains the target domain (yours): OK and added as white entry in the table
    • If it does not: 403 and added as black entry in the table.
  • If the referrer’s domain is not reachable or does not appear to be a proper URL: OK but the function returns false (basically, you can decide to be extra paranoid and refuse the connection when check_referrer() returns false).

OK means that the page is displayed absolutely normally (the user will never know he’s been screened).

403 means the user receives a “403: Access Forbidden” error, with a notice informing him that he has been detected as potential Referrer Spam. The user is not barred altogether from viewing the page (only from this referrer): he is provided with a special link on the error page that will redirect him to the page he was originally coming for.

There can and will be a few false positives. Possibly a few bad whitelisting (e.g. a spammer getting whitelisted through a trick of some kind, although this is unlikely). And more likely a few bad blacklisting: if for some reason the referrer page is not publicly accessible (e.g. a webmail server or such). The latter is why there is a default list of whitelisted domains that contains most search engines and other domain susceptible to appear in a referrer without containing your URL.

I am in the process of building a semi-exhaustive list of webmail domains to be whitelisted by default, as there is no other way to tell these apart from a bad referrer. Feel free to send me your own suggestions.

Optional: Extended features

There are basically two extended filtering features you can use with RK:

1. SK integration

The following setting only apply if you have Spam Karma installed and running. If you don’t, there’s no need to worry about it: RK will still do its job. But having RK take advantage of SK’s blacklist (and vice versa) helps minimizing CPU and bandwidth costs on your blog.

Open your rk_settings.php file and replace the following two lines:
$use_SK_blacklist = false;
$secret_blacklist_string = "rumplestiltskin"; // CHANGE that value if you use the SK blacklist: pick any word you

by:
$use_SK_blacklist = true;
$secret_blacklist_string = "[some random word you pick]";

The first variable tells RK to use SK’s list (make sure it’s installed!). The second one is a sort of “fingerprint” (you don’t need to remember it, just fill in anything you want) that is used to allow banned IP to auto-unban themselves.

2. .htaccess Blacklisting

This one should probably be only used if you have some experience tweaking your .htaccess file.
Basically, if you forward or mod_rewrite a URL toward: referrer-karma.php?rk_redirect_to=[some URL]&rk_ban_this_ip=1, the client IP will automatically be blacklisted in SK and will receive a 403.
If you provide a redirection URL for the rk_redirect_to param (for example, the original URL), RK will offer to lift the ban and redirect the user, upon simply clicking a link. If you leave that parameter empty (nothing after ‘=’), the user will only get a rather dry “get lost” message… so make sure you only do that for absolutely unmistakable spam.

For example, I have added one simple rule to my .htaccess file:

RewriteRule ^cgi-bin/MT/.* http://unknowngenius.com/blog/wp-content/referrer-karma.php?rk_redirect_to=&rk_ban_this_ip=1 [NC,L]

This rule insures that any spambot randomly trying to locate a Movable Type script (e.g. mt-comments.cgi) will end up in my permanent banlist: not only will they receive a 403, but the next time they try accessing any other areas of my blog, they’ll still receive a 403.

Feel free to use this rule (after replacing with your own paths, of course). It ought to work equally well, adapted for any other URLs that are not supposed to be queried by a legitimate user. I believe it would even be possible to put together a more complex set of mod_rewrite rules to redirect spambots that try to access your wp-comment.php file directly without a proper referrer (or without having queried anything else on your server). But be careful if you get into these waters: you could easily break your comments.

Change log

2.1, 2.2: Fixed bugs (SK2 stuff).

2.0: Added SK2 inter-operability. Can now check for a custom regex (instead of only domain name) in referring page’s content.

1.7: Removed RK IP blacklisting (still using SK’s IP banlist, however), as it was both redundant and source of some annoying recursion bug… Shouldn’t matter too much. Complete facelifting for logs and lists, courtesy of Jeff Minard, who might be involved in RK’s future development…

1.6: fixed bugs introduced by 1.5.

1.5: Changed treatment of unreachable URLs. See comment #36 for details.

If you are running anything older, you must upgrade! Nearly every previous versions contained major bugs that have been fixed ever since.

Doc in progress… contact me if you have any question

227 Responses to “Referrer Karma”

Whoops, the narrow column made that really hard to read. The URI is:

/path/to/referrer-karma.php?ref-karma-setup=true&pwd=your_secret_password

Rick Beckman says:

In response to comment #140, I think I have found a working solution for getting Referrer Karma to play nice with phpBB. Check it out if you are using the phpBB add-on for viewing referrers, or inclose the Referrer Karma code in something like this:

if ($_SERVER['HTTP_REFERER'] && !eregi($_SERVER['HTTP_HOST'] . $board_config['script_path'], $_SERVER['HTTP_REFERER']))
{
REFERRER KARMA HERE
}

That will prevent phpBB from calling referrer karma if a referrer is not set or if the referrer happens to be another page on the same domain, effectively allowing the login redirects to work.

Rick Beckman says:

Okay, I keep getting “message_die() was called multiple times” errors from phpBB when certain pages refer hits to the board (I don’t know why only some pages do this and not others).

I have temporarily disabled Ref. Karma on my phpBB board until someone with my skill than I takes a look at this. :)

Yogan Shultz says:

KillSpy is the only anti-spyware solution you need – it combines all important features in one program:
Spyware scan and removal – detects and removes spyware, Trojan horses and other malicious programs installed on your PC.Real-Time Shield – provides online protection from various harmful scripts and programs while you browse the Internet.Secure Disc – allows you to create highly secure virtual discs to store your most important and confidential data. http://Killspy.me.ly

Help! Referrer Karma is no longer blocking all spam.

I’ve had Referrer Karma installed for some time (since Oct 20th, from my comment above). It was working great initially, blocking just about everything.

Since a few weeks ago it’s become much less effective. It’s still working (today, for example, it’s already blocked many hundreds of referrers) but it’s also letting a good amount of stuff leak thru – about 600 since the beginning of the month. Here are three that consistently get in even tho they’re spam:

2) 170 http://buy-phentermine-online.freewebtools.com.removethis/
3) 163 http://h1.ripway.com/buy-phentermine–online.removethis/
4) 160 http://buy-phentermine–online.servik.com.removethis/

Get rid of the .removethis for the actual site name – I don’t want any chance of these bastards getting pagerank sent their way. The second number is the number of times this url made it through.

I’ve reset the database, same issue shows up. Are there Referrer Karma logs I can look at? Any ideas what might be going on?

Also, how do you feel about indicating the Referrer Karma version somewhere obvious on the install, say at the bottom of the admin and log web pages, as well as as part of the package name? I wanted to know if I have the latest version, but the zip file name doesn’t have a version number on it, and the referrer-karma.php page says “Referrer Karma 2.4b”, which is different from Current Version listed above…

To followup on my own comment, here’s more strangeness: looking at the Referrer Karma log, I see the sites listed above as blocked, and I even see some blocked attempts, but somehow other attempts from the same site are getting thru.

And to followup on my comment again, nevermind, I was mis-reading the log. These are actually logged as 403′s, I just have to setup my stats software to ignore 403s.

So to summarize, nothing wrong with Referrer Karma, my mistake.

Teli says:

I just had to stop by and say thank you! Referrer Karma is a webmaster’s dream, especially if it’s a popular website that attracts a lot of referrer spammers.

So far, in less than a month, it has already blocked nearly 20,000 spam attempts and I haven’t noticed any adverse affects on the speed/functionality of my website :) .

Truly a work of art.

Zach Harkey says:

Sorry, code was filtered. Will use square brackets instead of angled:

On line 26 of referer-karma.php, is [http] supposed to be [html]?

Rich Tatum says:

Is RK not effective when used in conjunction with wp-cache? It seems that after clearing my cache, RK catches some new referrer spam, but once a page is cached I’m guessing RK doesn’t get called when the cached file is delivered. Is this true?

Because I’m seeing a bunch of previously blocked referrers getting through again, and this seems to be the only way they are logically bypassing RK–through my cache.

I may have to simply live with modifying htaccess files if this is the case. :: sigh :: Please let me know.

Regards,

Rich
BlogRodent

dr Dave says:

Tatum:

RK will only be effective if correctly called. I have no idea how exactly wp-cache works, but if, as one may suspect, it presents a static HTML file to visitors instead of the dynamic PHP page containing the call to RK, then obviously RK will never be triggered.
You should ensure the call to RK comes first before anything on whichever page is your main index page.

Cheers

Judith says:

Hi,
I’m unable to complete step three:
Browse to referrer-karma.php?ref-karma-setup=true (after prefixing the correct path to the file on your server, of course). And make sure all the checks are successful.

I complete steps one and two, but if you to
http://www.atennisblog.com/wp-content/referrer-karma.php
you will see that nothing comes up on this page.

Any help?
Thanks!!!

Judith says:

About the previous message–I figured that out. But when I go to:
http://www.atennisblog.com/wp-content/referrer-karma.php?ref-karma-setup=true
I get this error:
Warning: mysql_connect(): Access denied for user ‘guchuj05_wrdp1′@’localhost’ (using password: YES) in /home/guchuj05/public_html/wp-content/referrer-karma.php on line 635
check_referrer() error.

I checked and my username is correct, as well as the database. Is there something wrong on line 635?

Leandro says:

Referrer Karma is working perfectly on my site, but I don’t know how to see the statistics. Browsing referrer-karma.php only gives me a white page… am I doing something wrong?

Sarah says:

I’m really confused. You claim to be against referral spam but that’s precisely how I find this – through a link from http://www.dummies-guide-to-dmoz.org/wordpress/2006/05/14/tulip-chain/ which then points here.

Dodgy as anything.

Quentin says:

Keep getting the following error:

Please edit referrer-karma.php and change $can_configure to ‘false’

Everything else on the page is blank, and I only see this line.

BTW – I have changed the config file to false, so I’m not sure why I am getting this error.

Barb says:

Hi,

I saw this and immediately an orchestra of angels singing hallelujah flooded my head. This could be the answer to my prayers. But… I’m kind of an idiot.

I am not a programmer. I inherited responsibility of a website when the webmaster suddenly passed away. Two days later we were flooded with comment spam. I’ve been trying to find a solution ever since. The code is created from scratch (not MT or WP). It’s mostly PHP.

So, I installed RK and configured it correctly (it said). Switched the settings value to false. And it seems the only users I have blacklisted are me. I have the PHP code given to me on the top of the index.php file as well as comments.php. Somewhere, somehow I’ve done something wrong (see the part about me being an idiot). I have this lovely log in my SLQ database, but I suspect I have the code in the wrong place.

Any clues? Does the blacklist feed from the htaccess file or vice versa?

jens says:

Hello,
I am using phpBB forums which I want to protect with your great tool. I am not quite sure where or to what file I have to put the “require once …code” what file would be the best to insert and where. The index.php??

Thx in Advance

Kenn Christ says:

Does RK provide a way to “alias” (for lack of a better term) domains?

I have a number of pages and images on my sites that were formerly on other sites. Old links are handled via 301 redirects on the original sites. The problem this is that hits from pages that still contain the old URLs will be caught by RK due to the current domain not being found on the referring page. I’d like to be able to tell RK that referring pages that don’t contain links to the current domain, but do have a link to these other domains, are ok and should be allowed. Basically “aliasing” other domains to the current one.

I’ve added all domains I control to my whitelist, but most of these problem links are on sites that I don’t control.

Thanks, and keep up the good work.

Alan Kellogg says:

Followed instructions, didn’t work.

WordPress 2.0.3

Blue says:

Well, I installed it and it successfully created a whitelist entry. However, I know for a fact that I have had at least 20 spambots visit me in the last week or so because they failed my registration process, but none of them have been added to the blacklist. Can I add an IP address to the blacklist manually? If so, how? Thanks.

shello says:

Is it safe to use the same database that wordpress uses?

[...]  厳密にはWordPressのプラグインではないのですが、有用なWordPressプラグインを数多く世に送り出しているデイヴ氏が作成された「Referrer Karma」を、先ほどこのブログに導入してみました。 [...]

[...] before such incident happens. Nobody likes a downtime. Here’s what I did after much research. I installed Referrer Karma This is not a plugin, so you need to manually install it via coding. Edit couple of file, install [...]

[...] Referrer Karma to my anti-spam arsenal as well as the stats plugin so everyone can see what’s getting [...]