Frustrated Users and New Developments

This entry was originally gonna be a comment posted on Dave’s Chalkboard in response to this post. But then I realized it had taken the size of a novella, and furthermore, most of its content is probably relevant to other people too. So here goes another entry about Spam Karma
Sorry, I know this is getting tedious, I’m tired of talking about it too… I promise this is the last time you hear about it until I finally get off my ass and release SK 2.0

First of all, believe me I am the first one sorry to hear that some people are being consistently singled out by SK: it is far from a perfect tool, especially in light of recent changes brought by WP 1.5 (regarding trackbacks for example, WP code was changed in a way that runs all trackbacks through comment filters, not a great decision imho, since it has the result of breaking lots of backward compatibility with filters that were only intended to work on comments)…

In your case, I suspect your IP might have ended up on some major Realtime Blacklist (RBL) servers such as Spamhaus: these lists are stuffed with false positives and are not under my control and I actually advise people to turn that filter off unless spam really keeps coming through.
Another very likely culprit is the use of a proxy server that mask (or changes) your IP. This is a definite comment killer… Since it’s the signature move of spambots trying to spoof IPs. Not using this criterion would make spam-filtering nigh impossible… I’ve been looking into ways to detect friendly proxies and force them to use the same IP and not cache the page, without success so far…
Setting SK on “lenient”, as somebody pointed out, is probably a good idea too…
If you want to contact me directly by email and try posting a test comment on my blog, I’ll be able to tell you exactly why it’s not working (in fact, anybody could tell you, since they’ll receive your comments in their SK digest, along with detailed headers).

Anyway, trust me, I’m the first one unhappy with SK’s imperfection, especially given its high rate of adoption these days: there’s nothing more frustrating than seeing bugs you don’t have time to fix, being downloaded by hundreds of people…

I WILL resume development, and I believe I can bring SK much closer to a 0% false positive score, which was the initial goal (yea, we kinda drifted, somewhere along the escalating arm race, when it became so annoying to deal with spam, that I really had to crank the filters up).

In response to Adam, I am actually both studying and trying to make a living full time, which, along with attempts at preserving some kind of social life, leaves little extra time for side projects…
However, SK is absolutely open-source (MIT license) and anybody is free to take the ball and run with it (with proper credits etc, of course). As for a more coordinated effort: I have occasionally been getting help and snippets from people, but the size of SK (it’s totalling around 3000 lines of PHP right now), makes it a non-trivial coding project and requires some level of involvement. Plus its code has mutated into something rather horrible, over successive versions…
So I haven’t really found anybody willing to put that much effort yet.
That being said, and even though I have officially pulled the brake on support and development for now, I usually make an effort to integrate any snippets, diffs or bug fixes sent to me…

Regarding Referrer Karma: I probably shouldn’t open my mouth again, but I will, and I’ll say that the potential for user frustration and overall false positive banning is much, much, less than with SK.
First of all, RK does not ban users: worst case scenario, they are simply asked to click on a redirection link (that rids their http query of its potentially spammish referrer). And this will only happen when two important conditions are not met:

  1. Their referrer is not on any of the default whitelist or hasn’t been whitelisted by a previous successful attempt.
  2. The referrer URL is reachable, but doesn’t contain their URL.

At the moment, the only major source of false positives is webmail servers, since it’s impossible for RK to check these. But this is why there is an extensive whitelist, and I am trying to slowly add all major email and search engines. In the meantime, once again, the worst that can happen is that people have to click on a link to see your site (and you can easily whitelist the referrer in your settings once you spot them). I might work on better auto-whitelisting in the future. For now that’s all there is.

I have also added some (fully optional) integration features to make it use SK’s IP blacklist.

Basically, the concept being to stop lying there and taking it while thinking of England…
Even when spam comment do not make it to the blog, their relentless attempts eats up heaps of Bandwidth and CPU (especially, I suspect, with SK’s heavy filtering process in the middle). RK’s new version simply blocks them at the door, before any serious computing starts… and there again, does it intelligently, since users can easily unblock themselves and see your site by merely clicking a link. Check out Referrer Karma‘s page for the latest details on this feature.

4 comments

  1. I am very relieved to hear that you are willing to help figure out what’s up with SK blocking my comments. I was beginning to worry that I wasn’t going to be able to comment on any site that had SK running. BloggingPro wasn’t very accommodating when I emailed them after I got the infamous message.

    As to my IP being on a blacklist somewhere. From what I saw when looking on Google for RBL’s, it would be a very time consuming task to try to track down which RBL has my IP if any of them do. Spamhaus didn’t. There is an email spammer using one of my domain names as the from address for their evil task. I’m not really happy about it, but there isn’t a whole lot I can do. At least they are not using the webhost’s server to send the spams.

    As to the referral spam problem. You don’t have to preach about bandwidth waste there. My old domain was getting hit so bad, I was seeing up to 10GB’s of bandwidth a month on referral spam alone. Since I switched domains, I’m not seeing that much. I have been using http://www.planetOzh.com‘s No Refer Spam. It’s a very small PHP file that basically redirects the http request back to the site attempting to leave the referral spam. Quite ingenious really. The referral spammer actually winds up refer spamming their site instead. 🙂

    I would love to see that idea merged with a database and a list so that I could simply click on the referrals that I consider spam and not have to edit the PHP file each time I want to add entries.

    I wish I knew enough about PHP to help with SK. It would be great if there was a way to drop to a secondary way of verifying if a comment is valid instead of just dropping the comment outright. :shrug:

    Good luck with the projects and I hope you find some help with it.

  2. Dave M.,

    Well, as you can see, your comment made it fine through SK on this blog, so I’m glad to tell you that it isn’t some fundamental flaw or blacklisting (I have all default filters activated on my install, so as to be able to monitor it for myself). Doesn’t explain why you’d be kicked of everybody else’s blog though. But to know that, the only solution is to get one of the admin to look at his SK logs and check what is the reason given.

    SK does gives a secondary way to prove that your comment is not spam, but only does that if your “karma” score isn’t too low to begin with. The only way to fall below this threshold is usually to show up as a spoofed IP (and that can happen if you are behind some sort of proxy server that caches pages and sends a different IP between the time you view the entry and post your comment) or trigger *many* different filters at once… Once again: only one way to find out…

    Regarding ref spam, planetOzh’s script is indeed a simple and effective solution, but much different from RK. I recommend you read RK’s doc to see exactly what I mean, but the basic difference is that RK is mostly automated and doesn’t rely on a (very annoying to maintain and not very effective) list of keywords. Further more, while kicking spammers back to their own page might sound like fun, you need to remember that there’s always a potential for legitimate users to get that too, and we come to the exact same issue you are (rightfully) complaining about with SK 😉
    RK never kicks the user out. At most, it presents a “click here to view the page” redirection message…

  3. Knowing that SK has a log is good news. The next time I get blocked, I can e-mail the site in hopes to find out why. I’m surprised that BloggingPro either didn’t know about this log, or felt that it wasn’t worth their time to find out why for me. Ah well…

    As to SK giving me outs. There were a couple of times it mentioned the ability to enter a CAPTCHA or get an email to verify I was real, however, the site didn’t present me with the CAPTCHA or a way of telling it that I would like to use the email verification. I don’t use proxy’s, I understand what they are, but know of none to use. I really can’t imagine why I would be so high on SK’s bad side as to now give me an option to prove myself a valid commenter.

    As for RK, I’ll have to check that out. I would prefer an automated system. The idea that RK will let people through if they “click” a link is a good idea. It is a pain to deal with planetOzh’s system adding new sites as they arrive. The thought of blocking legit surfers is worry some, but getting hundreds of referral spam hits was overriding that fear.

    I’ll give RK a try tonight. Thanks for the heads up on how it works!

Leave a Reply