Current version: 2.2

Overview

Referrer Karma is a rather simple script that prevents malicious bots from accessing your pages, flooding your logs and possibly draining your server’s bandwidth. All it does is check that an incoming bot has a valid referrer field URL (i.e. that the page it claims to come from, does exist and does have a link to your site). If RK thinks the incoming visitor is a malicious bot, it displays a 403 error page (which will not be counted as a visit by log analyzer tools) and uses HTML redirecting to the original URL to avoid blocking legitimate visitors (See below for details).

RK keeps a detailed log of referrer URLs that have been blocked, approved or skipped, as well as a short-life black/whitelist of previously examined referrers. It is designed to take the least amount of time possible when examining an incoming referrer. The interface lets you manually whitelist or blacklist a particular URL, as well as add important domains to a static whitelist file.

Recent versions also interface with Wordpress’ anti-comment-spam plugin Spam Karma to block spambots before they even reach your comment page (note that you do not need to be running either SK or WP in order to use RK: this feature is entirely optional).

Requirements

You must of course be running PHP.
You also need to have URL fopen or CURL enabled on your install of PHP (most servers have at least one of either, but you might want to check with your host).

Download

http://wp-plugins.net/downloads/ref-karma.zip

Disclaimer

Please read carefully the details and warnings below. Only install if you are comfortable with a certain level of risk (nothing I consider a show-stopper, personally). Overall, I just cannot afford to do personal support for everybody, since I should not even be spending a second developing this at this point in my life. Which doesn’t mean I won’t listen to suggestions or bug reports (particularly bug report with a code fix: these are much appreciated). But as a rule, consider yourself on your own with this plugin.

Instructions

Install is really easy… provided you follow carefully these instructions. There are no traps and nothing that anybody with basic knowledge of the web cannot do.

  1. Drop all the files (referrer-karma.php, rk_settings_sample.php, whitewords.txt and whitelist.txt) anywhere on your web server. If you are a WP user, I would recommend copying them into wp-content
  2. Edit rk_settings_sample.php and fill in the necessary values (there are comments to guide you inside the file). Basically, the only required values are MySQL connection variables. It is recommended to set a password value too. Ensure that $can_configure is set to true (it sets Referrer Karma into “setup” mode). Save and rename the file to rk_settings.php
  3. Browse to referrer-karma.php?ref-karma-setup=true (after prefixing the correct path to the file on your server, of course). And make sure all the checks are successful.
  4. If the setup is successful: edit the rk_settings.php file again and switch off setup mode by setting $can_configure to ‘false’. This step is essential and any attempt to use RK will fail if the variable is not set to false.
  5. Open the main PHP file of the page your want to protect: this can be any file that spammers are likely to hit often (e.g. your blog’s ‘index.php’ file) and copy-paste the lines you were given on the setup screen (“include” etc). To be effective, the call to RK should be inserted at the very top of the very first file ever used by the pages you are protecting (a common header file or the top of a template), before any HTML or PHP.
  6. Sit back
  7. Enjoy the show

Displaying RK stats

Matt put together this very cool little plugin for Wordpress that will automatically insert RK stats in your footer (the same way SK does). You still need to install RK separately.

A few very quick things

  • You can manually add whitelist domains to the file whitelist.txt (just add a new line with your friend’s domain)
  • You can do the same with keywords (used on the whole referrer string) in whitewords.txt
  • Whitelist and whitewords supersede blacklist, so even if an entry is blacklisted, whitelisting it will make it go through.
  • Don’t hesitate to reset the list any time: it isn’t very costly to build, more like a “cache”.
  • List entries automatically expire (if not used) after 10 days (you can set the number of days in the settings file).
  • Use the password feature (set a password in the config file and append &pwd=[your password] to the setup URL) in order to access the setup screen when you are out of setup mode (i.e. when you have changed the value of $can_configure back to false).
  • At the bottom of the setup screen, you will see links to: 1) Reset the tables (whitelist, blacklist and logs) 2) See logs 3) See current entries in the white/black list 4) See logs without 403’s (allow you to see at a glimpse recent additions to the black and white lists, without the hundred following spambot attempts).
  • If you enable Spam Karma compatibility in the settings file, RK will automatically use SK’s IP blacklist to block spammers at the door. This is a no-risk block though, as visitors will still be given a “click through” link to unban themselves and browse your site.
  • Recent versions of RK also allow you to set a regex to be used on the referrer’s page content to check for validity (for example, you could set it to approve any referrer that contains your page’s name in order to lower the risk of false positive).

How RK works

Here is, in a nutshell and with some simplification, how Referrer Karma decides whether to allow a referred inbound link or not. The steps order is important (i.e. if a test is conclusive, the script doesn’t go any further):

  • If there’s no referrer or if it’s from the same domain: OK
  • If the referrer’s domain is matched by an element of whitelist.txt: OK
  • If the full referrer is matched by an element of whitewords.txt: OK
  • If the referrer’s IP matches an ip_ban value (means this IP has been known to access many times through a bad referrer) in the blacklist table: 403
  • If the referrer’s domain matches a white entry in the table: OK
  • If the referrer’s domain matches a black entry in the table: 403 plus ban IP after a certain number of attempts
  • If the referrer’s domain is not in the table, then RK parses the referrer’s source page and:
    • If the source contains the target domain (yours): OK and added as white entry in the table
    • If it does not: 403 and added as black entry in the table.
  • If the referrer’s domain is not reachable or does not appear to be a proper URL: OK but the function returns false (basically, you can decide to be extra paranoid and refuse the connection when check_referrer() returns false).

OK means that the page is displayed absolutely normally (the user will never know he’s been screened).

403 means the user receives a “403: Access Forbidden” error, with a notice informing him that he has been detected as potential Referrer Spam. The user is not barred altogether from viewing the page (only from this referrer): he is provided with a special link on the error page that will redirect him to the page he was originally coming for.

There can and will be a few false positives. Possibly a few bad whitelisting (e.g. a spammer getting whitelisted through a trick of some kind, although this is unlikely). And more likely a few bad blacklisting: if for some reason the referrer page is not publicly accessible (e.g. a webmail server or such). The latter is why there is a default list of whitelisted domains that contains most search engines and other domain susceptible to appear in a referrer without containing your URL.

I am in the process of building a semi-exhaustive list of webmail domains to be whitelisted by default, as there is no other way to tell these apart from a bad referrer. Feel free to send me your own suggestions.

Optional: Extended features

There are basically two extended filtering features you can use with RK:

1. SK integration

The following setting only apply if you have Spam Karma installed and running. If you don’t, there’s no need to worry about it: RK will still do its job. But having RK take advantage of SK’s blacklist (and vice versa) helps minimizing CPU and bandwidth costs on your blog.

Open your rk_settings.php file and replace the following two lines:
$use_SK_blacklist = false;
$secret_blacklist_string = "rumplestiltskin"; // CHANGE that value if you use the SK blacklist: pick any word you

by:
$use_SK_blacklist = true;
$secret_blacklist_string = "[some random word you pick]";

The first variable tells RK to use SK’s list (make sure it’s installed!). The second one is a sort of “fingerprint” (you don’t need to remember it, just fill in anything you want) that is used to allow banned IP to auto-unban themselves.

2. .htaccess Blacklisting

This one should probably be only used if you have some experience tweaking your .htaccess file.
Basically, if you forward or mod_rewrite a URL toward: referrer-karma.php?rk_redirect_to=[some URL]&rk_ban_this_ip=1, the client IP will automatically be blacklisted in SK and will receive a 403.
If you provide a redirection URL for the rk_redirect_to param (for example, the original URL), RK will offer to lift the ban and redirect the user, upon simply clicking a link. If you leave that parameter empty (nothing after ‘=’), the user will only get a rather dry “get lost” message… so make sure you only do that for absolutely unmistakable spam.

For example, I have added one simple rule to my .htaccess file:

RewriteRule ^cgi-bin/MT/.* http://unknowngenius.com/blog/wp-content/referrer-karma.php?rk_redirect_to=&rk_ban_this_ip=1 [NC,L]

This rule insures that any spambot randomly trying to locate a Movable Type script (e.g. mt-comments.cgi) will end up in my permanent banlist: not only will they receive a 403, but the next time they try accessing any other areas of my blog, they’ll still receive a 403.

Feel free to use this rule (after replacing with your own paths, of course). It ought to work equally well, adapted for any other URLs that are not supposed to be queried by a legitimate user. I believe it would even be possible to put together a more complex set of mod_rewrite rules to redirect spambots that try to access your wp-comment.php file directly without a proper referrer (or without having queried anything else on your server). But be careful if you get into these waters: you could easily break your comments.

Change log

2.1, 2.2: Fixed bugs (SK2 stuff).

2.0: Added SK2 inter-operability. Can now check for a custom regex (instead of only domain name) in referring page’s content.

1.7: Removed RK IP blacklisting (still using SK’s IP banlist, however), as it was both redundant and source of some annoying recursion bug… Shouldn’t matter too much. Complete facelifting for logs and lists, courtesy of Jeff Minard, who might be involved in RK’s future development…

1.6: fixed bugs introduced by 1.5.

1.5: Changed treatment of unreachable URLs. See comment #36 for details.

If you are running anything older, you must upgrade! Nearly every previous versions contained major bugs that have been fixed ever since.

Doc in progress… contact me if you have any question

227 Responses to “Referrer Karma”

MacManX says:

Rori, what do your RK logs say? Is RK blocking every referrer?

Eric says:

PaulaO, this is not Referral Karma, this is a separate WP plugin that you are using. I just go directly to the URL like you do. Read the other plugin’s info on fixing it. it should tell you.

Rori, Nohting in the logs doesn’t mean it’s not working. SPammers sometimes take a break (shock!) and I get no new records either. Unless you can prove otherwise that spammers are getting through, I wouldn’t assume it is not working.

Rori says:

Thank you Dr Dave! I am sure it is plugins like yours that make them take a break!

PaulaO says:

dr Dave, I don’t understand. What other plugin? It is Referer Karma that has the URL issue. screenshot is at http://paulaoffutt.com/blog/wp-images/rk_clip.jpg
Clicking on the Referer Karma tab sends me here:
http://paulaoffutt.com/blog/wp-admin/wp-content/referrer-karma.php?ref-karma-setup=true&pwd=****. The ‘problem’ is the /wp-admin/ shouldn’t be there.

I also have Spam Karma running and use the Gila theme. I have no other spam/referer plugin or program running.

Eric says:

PaulaO, I am not Dr Dave, I am Eric. And that is a separate plugin for WordPress that is NOT referral Karma. it allows you to put Referral Karma stats on your page. It provides a link to setu pyour Referral Karma but it does not operate properly. I have it installed myself. Referral Karma is not a WordPress Plugin. you installed Referral Karma Stats by this dude: http://mattread.com/archives/2005/03/rk-stats/

As I said in my previous comment, you need to read the instructions on that page i just entered above in order to fix or understand why that link is not working.

PaulaO says:

Oh, okay, now I get it. But why does it try to send me to the RK setup page? Never mind, don’t answer that. I have the feeling it is over my blond head. Obviously quite blond since I didn’t get your name right.

Thanks for the answer!

Mindy says:

Hi again, I think something might be wrong with mine? I have been getting tons of spam hits listed, which is great, but I also seem to get too many (I think) false positives. I noticed a big string of them right in a row from a legit site, so I decided to try to get the page myself, but clicking through on a site that wasn’t whitelisted yet, but should be. So I went to a forum I use, and clicked my homepage link. I got the click through page, but it kept refreshing like every second, I didn’t even have time to read it, I finally just clicked the link just to get it to stop. Then when I checked my logs immediately after, it is blacklisting the referrer and the ip, and showing up each time in the log for each time it refreshed. How do I fix this? How can people whitelist themselves if they can’t even read the page?

Mindy says:

And also, when I try it again, from the same link, which now shows blacklisted, it still does the same thing, and finally if I just let it keep refreshing it does eventually redirect to the site. Shouldn’t it not allow into the site, if the referrer is now on the blacklist, or am I misunderstanding?

Bryan says:

Hey Mindy, like MacManX indicated above, it’s probably best if you whitelist all the sites you know are good that link to you. Everytime I need to whitelist something in the logs, I also add it in the whitelist file so it doesn’t get blacklisted again by mistake.

I think the way it works is that it’ll give a 404, but redirect to your site anyways. By doing this, it doesn’t touch your referrer logs. Someone will correct me if I’m wrong. ;)

Spam protection

Since i was getting a lot of comment spam recently, i had been running drDave’s excellent Spam Karma plug-in recently. Since a couple of days i have upgraded this to the upcoming version 2 (still under development), which does an even better job…

Lordrich says:

simply blocking the referrer spam isn’t actually helping much – how about for every request that is blacklisted, return a 301 code redirecting them back to the site they are spamming?
And how about a perl script which will read the blacklisted referers and remove relevant lines from an Apache log?

Eric says:

The problem comes in for REAL incomming links that the script has problems identifying. For example, you leave a comment on someone’s blog and they read it in their WordPress dashboard. RK can not connect to this URL because it needs to login to the WP admin, which it obviously can not do on random sites. The user would then be returned back to their WP admin panel for no reason. I also have some oddball search engines comming to my site and they get blocked. Many things can happen that a real person comes to your site with and accidently get blocked. Thats why the refresh to the site to remove the referrer happens.

Maybe we can add a “known” list that will redirect them, but most of them do not follow redirect rules. They use special agents whos sole purpose is to request a URL with a certain referrer and thats it. Following 301 is not a requirement or maybe even useful for them. Of course, I can be wrong, I do not use these programs, but following 301 would be kinda dumb for them to do.

As for removing the entries from the log file, my stats program does not count 403 hits as a real hit from a user and so this doesn’t bother me. But you could easily use a perl script and a regex to accomplish this. Heck, even cat and grep on the command line should do it.

MacManX says:

Eric, that’s what whitelist.txt is for. Just add /wp-admin/ to it.

Eric says:

It was an example. I have already added it to my whitelist. But you can’t think of every possible situation that something like this can happen. Also, you will never know unless you check your logs very often. You can pretty easily alter what RK does and enact this feature on your own.

TwisterMc says:

How do I block sites that referrer karma doesn’t? So far a site has referred traffic to my site and it’s all spam, however RK didn’t notice. It says “Can’t reach referrer URL: Ignoring.” Can I make a blacklist.txt? Or ban IPs? Or maybe I just need to wait a few days and see if it work’s itself out?

Eric says:

Goto the Show White/Blacklist page and click on “Switch to blacklist.” Also, What Referral Karma does when it can not request the referring page can be customized in its settings. I believe you have it set to not ban them. Change it to ban and the problem should go away.

Karma Dashboard Plugin

Last week, I installed Referer Karma because I was sick of stupid referer spam. On thing that I wasn’t the biggest fan of was the lack of real integration into Wordpress. I understand why; it was meant to be usable to more than us Wordpress-ers. …

TwisterMc says:

Eric…. they don’t show in the whitelist/black list because they were ignored. I then go into my database and physically add them. However, your comment on ban/no bad will be looked into. Thanks :)

TwisterMc says:

I have $ban_unreachable_urls = true; and they are not getting banned. :( Help please. I just installed RK the other day so I’ve got the newest release.

TwisterMc says:

fixed. don’t use it with referrer bouncer.

IO ERROR says:

Well, I was on vacation…

For the past couple of weeks I’ve been spending much time away from news sources, computers, and the usual suspects, and actually trying to get out of the house once in a while and see actual human beings in person. So I haven’t written to…

shep says:

this design is beautiful dave. keep up the good work.

Is there any way to get RK to check to referrer page for strings other than my host before banning the referrer? My site’s address is http://www.randall100.f2s.com, but http://www.underblog.co.uk also points there, and pages with an underblog.co.uk link are banned at the moment. Also, I’m finding random blogs on a webring I’m on get banned too. I’m assuming that it’s something to do with this code:
if (strpos($content, $this_server) !== FALSE)
{
mysql_query("INSERT INTO `ref_karma` SET `key` = 'white', `value` = '" . mysql_escape_string($ref_server_short) . "', `last_mod` = NOW()") or error_msg("Cannot insert whitelist entry.", false, true);
mysql_query("INSERT INTO `ref_karma_logs` SET `ip` = '". mysql_escape_string($_SERVER['REMOTE_ADDR']) . "', `ref` = '". mysql_escape_string($_SERVER['HTTP_REFERER']) . "', `msg` = 'Added as Whitelist entry (id: ". mysql_insert_id() . ", URL contained: $this_server).', `msg_type` = 'white', `ts` = NOW()");

return true;
}

But I’m a bit rubbish at this sort of thing and whenever I try to tweak it I end up causing errors or making my sight unreachable. Can anyone help?

Eric says:

In order to get it to check for multiple domains it would have to look something like this:

if ((strpos($content, $this_server) !== FALSE)) || (strpos($content, "other domain") !== FALSE))

Note: I am not a PHP Programmer. The above is a guess.
It shouldn’t be too hard to add this ability in the future though.

As for the webring, I am assuming it is because they dont actually have a link on your page, just the webring code. Getting around that would be extremely hard I believe. Maybe dr Dave can add a “whitelist” for words on the page. you could enter in some text that should appear on the site (like a portion of the webring code) if not your domain name.

David George says:

Hi

I am having a problem with a certain legitimate URL which hangs my ref-karma. I will take a look at the code myself but using Curl on your site I also get an error:-

$ curl -e “http://www.skipressworld.com/us/en/daily_news/2004/07/pistehorscom_r
eleases_200304_backcountry_fatality_report.html?cat=Adventure” http://unknownge
nius.com/blog/wordpress/ref-karma/
curl: (52) Empty reply from server

any ideas?

Rodney Beck says:

It seems your Referrer Karma isn’t compatible with Spam Karma v2.0 beta.

I was getting errors about not being able to find the .blacklist table (because it’s .sk2_blacklist in v2.0). I changed those but it seems the table layouts have changed a bit v2.0 of Spam Karma.

Just thought you’d like to know.

David George says:

Ok I think I’ve understood the issue in comment 125. The problem lies with ref-karma’s handling of IFramces and Scripts. It has to parse these itself as they put together at the client end and they may contain the referring URL.

The URL I gave sends ref-karma into a loop, it is basically a “spider trap”, processing megabytes of data. I think the recusion level probably wants to be lower. Interested in comments.

dr Dave says:

Rodney: Yes, sorry if I didn’t make that clear enough. At the moment RK’s “SK features” are only compatible with SK1. Note that it will work perfectly fine without, if you do not enable the extra feature. It will also work fine for all previous users of SK, even after they uninstall SK.
I will work on a new version with SK2 compatibility, as soon as I have a sec, next month.

David: Indeed, that sounds like a possible issue. There was a major bug in the code there. Are you running the latest version of RK?
If so, I’ll look into fixing the code, it is currently checking for infinite loops, but I guess I could lower the limit (it’s at 8 levels right now).

David George says:

Hi Dr Dave,

I’m using the version I updated yesterday. I’ve made a few changes which you can find in this file:

http://www.abcseo.com/papers/referrer-karma.zip

I copied the get_content function and changed it to check_content. This exits true the second it finds a referrer document with the site url otherwise it exits false. Perhaps more controversially I also reduced the referrer document size to check. This seems to work for me but let me know if I missed something. I’m currently testing on one of my sites.

I really appreciate your efforts in the fight against spam.

Dave: Got a minor issue in using one install to cover multiple WP installs on the same domain: Using the include_once() at the top of the file makes WP think that the RK database is the WP database, which isn’t true in this case. Any idea how to get RK to release its db connection so that WP can then subsequently connect to a different db?

dekay.org says:

WordPress Plugin Request: referrer to trackback

I am not sure if this is possible at all – but would it not be nice to have a (spam safe) way of converting incoming referrer-URLs to trackbacks/pingbacks? So you know when a blog post has been linked to on del.icio.us or so.

Make it talk to SpamKarma…

Steve says:

I’ve just set up RK, and while I don’t expect immediate logs, the “show white/black list” link doesn’t work; I get a blank page with just the grey RK header. Ideas?

I’m using WP and placed the files in http://www.site.com/wordpress/wp-content/ as recommended.

Steve says:

AH nevermind, I didn’t realize the white list was dynamic..its kicking in now. This is SWEET. :)

Steve says:

Could you please let us know on an update so SK2 will work with RK? I would really like this compatibility, but using SK1 doesn’t seem to be the smart thing to do.

Tash says:

Ok, I have installed SK2, works very well. Great plugin. Now I have just completed installing the RK plugin but I get the following error

Can’t select from Spam Karma blacklist
SQL Error: Table ‘xxxx_wordpress.blacklist’ doesn’t exist

what went wrong?

dr Dave says:

Tash: your error has to do with the comment just above yours.

Everyone else who asked: an SK2-compatible (and possibly slightly improved) version of RK will be out shortly (say, a week).

Cheers

Steve says:

Thanks for the info Dave, I look forward to the RK2 update.

RK has been running for nearly a week, and is doing a pretty good job..but what I notice is, when a new domain refers, lots of entries will make it to my logs before RK “kicks in”. For example, my logs show 194 hits today from “brisbeck.com”, and RK’s w/b list shows 164 attempts. Is this normal?

markdixon.ca says:

I suggest having a confirm on the ‘reset tables’ button. I clicked it by mistake today.

there seems to be a problem with SK and phpBB forum script. when trying to login to phpBB’s admin panel, the redirection doesn’t seem to work right. phpBB requires 2 authentication validation to get into the Admin panel. first, the user needs to login to his account and if he/she is an admin, a link to “administration panel” would be available. with SK enabled, the link doesn’t work. :/

please advise,

Gordon says:

Apologies for that empty trackback above (pre-posting about Referrer Karma as I’m about to go on holiday).

I do have a question, and it might be that I don’t quite understand how this works.

I’m still seeing referrer spam in my stats (Extreme tracking). Now for the referrer spam to show up there my index.php will have had to be loaded. That suggests to me that Referrer Karma is missing those referrer spam and allowing them through to my site. Correct? If it was working they’d get a 403 and never get through to my site to load my stats script (and generate the refer).

Without the referrer in question showing up in the referrer karma logs how can I ensure it’s blacklisted?? Or does this mean that that particular spammer has figured a way around your code???

[...] Dr Dave’s Referrer Karma is about the most useful anti-referrer spam tool I have seen so far. No more fiddling with .htaccess, no more redirecting spammers to their own pages. This simple php script keeps blacklists and whitelists, bans spammers when they are hitting your page too often, it also checks the reffererring URL for some indication of your site’s URL or even if it exists. For me it has reduced the referrer Spam I am getting to almost nothing at all – down from up to hundreds per day. [...]

[...] After a commenter alerted me to an infinte loop created by Referrer Karma I have now disabled it. As 90% of the bad referring links lead to pages that don’t exist anyway and my spam comments seem to be handled fine (in fact I’ve had hardly any on this domain since it’s launch), it just isn’t necessary to have running at this time. [...]

[...] I am now whooping and dancing around with glee everytime i see another bad referrer hit the Referrer Kama log, it’s really made my day. Now we just have to sit and wait for the next line of blog spammer attacks, most probably trackback spam and RSS Spam (which is already starting to hit Technorati results) The following posts may be relevant to this one: | Referrers | [...]

Is it just me, or are there referrers who’ve figured out how to game the “false” return—essentially, creating a “can’t reach referrer URL: ignoring” condition—and getting their attempts through that way? I’m seeing this from webalias.com amongst others.

I guess I could always flip the bit and deny people who get through with a false response…

Referrer Karma seems to be working

Referrer Karma is a little “plugin” for blogs of many flavors. The design is simple, usage requires including a file at the top of your main index page or any other page designed to take comments and adding the check_referrer() function to…

[...] Sorry. Bloody vikings. At any rate, I have two completely invaluable tools to thank for this happy state of affairs: Spam Karma 2.0 and , both created by Dr. Dave. If you use Wordpress, absolutely MUST install these two utilities at your earliest convenience – this means you, Tommy. When I first installed it, I used to get near daily e-mails telling me what spammers had attempted to post on my site. It caught them, and saved them for me to peruse at my leisure, so I could decide whether to keep them or dump them. [...]

[...] And, if you’re running any PHP website (including WordPress), run – don’t walk – and install Spam Karma’s cousin: Referrer Karma. It works on any PHP-based website, and can share the blacklist with Spam Karma 2 for a nice integrated spamroach killing machine. [...]

MadMan says:

Upon installation, I get this message:


Please edit the ‘referrer-karma.php’ file on your server and change the value of $can_configure to ‘false’ before you use the check_referrer() function.

This is not the file to edit. You have to edit the rk_settings.php file to change the value.