Introducing Spam Karma

Picture spam_sandwich.jpg
UPDATED: 12/09/2004 15:46 JST From now on, please check the central Spam Karma page to get the latest updates and news on this plugin.

Yet another techy update for my fellow bloggers using WordPress.

Now that it’s reached version 1.4 and that most (all?) major bugs have been ironed out, I feel it’s time to introduce the latest member in the ever-expanding WordPress plugin family

Spam Karma is a mean critter that truly enjoys killing

In fact it is so mean that we had to keep it in a special military-grade containment unit on this server.

Genetically engineered in the dark recess of our Secret Spam Research Labs and trained through months of reflex conditioning and shock therapy, this thing, once unleashed on your comments, will only let go of its death grip after the last spam has been shredded to pieces.

We haven’t fed it for a week now, and it could smell spam miles away in its sleep.

But while a fierce and merciless spam killer, this plugin is also a perfect companion for your kids and friend’s comments. Only the unmistakable foul stench of spam will trigger its ire… while questionable, yet potentially legit, comments will always be given a chance to clear themselves before being irremediably disposed of.

If you are using WP Plugin Mgr, install is as easy as a click on the “Check Updates” button and a click on the “One-Click Install”… Yep, that’s all.
For those still stuck in the last century, a manual install archive is available here. Please, please, RTFM: it’s short, sweet and contains essential details.

Once installed, make sure you check at least once the Option screen (in wp-admin, click on Options >> Spam Karma).

I strongly recommend you check for updates (if you are using WPPM it will do it automatically for you) at least once a week so as to make sure you benefit from the latest bug fixes I might make.

Spam Karma v. 1.4 is now compatible with WordPress 1.2: however due to the lack of certain functions in WP 1.2 Plugin API, some of the features are missing (Option Page integration etc). It is fully enabled for use with any fairly recent release of WP alpha 1.3.

Cool, but How does it work?

Layman’s Explanation

Spam Karma works by running every new comment through a battery of filters and checks. Each of which increase or decrease the comment’s ‘Karma’ value. Depending on the final score, the comment is either:

  • Approved
  • Discarded silently as spam (no email is sent to you, unless you specifically require it, but a digest is sent to you every X spams deleted).
  • Placed in Moderation mode. With the possibility for the commenter to auto-moderate his own comment by proving he’s not a spammer (by filling a Captcha or checking a confirmation email).

This whole process insures (by order of priority):

  • No deleted false positive (bad bad bad).
  • Extremely few moderated false positives (annoying): uses Captcha and email auto-moderation to keep these at a minimum.
  • No published spam.
  • very little spam held in moderation (must be destroyed directly: really annoying to have to moderate it).

Further more, Spam Karma works in an intelligent way to automatically update its filtering database and grow stronger with each spam it catches…

In short: blocks spam with no unnecessary annoyance, for you or your visitors. The way it should be.

The Detailed Explanation

For our more tech oriented friends, here are a few more insights on the rather complex process used by Spam Karma to decide what’s spam and what’s not. Each of the following filter is given a weight varying on many factors, ranking from user-controlled values (e.g.: after how many days is a post “old”?) to the credibility that can be given to a test (e.g.: a missing header is less important than a blacklisted IP).

Mostly, Spam Karma looks at the following things:

  • If the poster is logged in the current blog, and what his user level is (e.g. automatically approve Admin posts).
  • Presence of HTML entities (e.g. {, ʚ etc).
  • Presence of a HTTP_VIA header.
  • Proper use of the posting form (hash value must be present).
  • Time taken to fill the comment (e.g.: if it’s less than a few seconds, most likely spam).
  • Posting granularity. First time posters posting many comments at once vs. old-timers (with comments previously approved by the admin).
  • Previous diagnostic from WP’s built in comment check (set on the ‘Discussion’ panel).
  • IP and regex match for URLs contained inside the comment (small weight only for non-URL text matching a URL regex).
  • Realtime Blacklist (RBL) Server check for IP and URLs.
  • Comment’s age (e.g. penalize comments on very old post).

In addition to these filters, Spam Karma uses different treatments and backup checks to insure it becomes better at stopping further spam and that it never deletes mistakenly a legit comment:

  • Ambiguous comments (that can neither be deleted or approved) are given a second check: commenter is asked to solve a Captcha or use the email auto-moderation (an email containing a hash to unlock the comment is sent to the commenter’s email address). If confirmed, the comment’s Karma is bumped up and the comment is either published or held for further review, if not confirmed within a certain period, its Karma is lowered and it is either deleted or kept into moderation (if it was sufficiently high to begin with).
  • When a comment is struck as spam, its IP and URL(s) are harvested and submitted to the Admin for inclusion in the blacklist. In the meantime, they are used as “auto-added” values, with a lesser weight than permanent blacklist entries.
  • When destroying a spam comment, it checks for recently posted comments that match similar values and retroactively moderate them (e.g.: a spammer could manage to slip X numbers of spams onto a blog, but upon reaching a certain suspicious threshold, all the comments would get retroactively moderated, then deleted).
  • Spam Karma uses a central DB to retrieve IP and URL updates. By default, it will query the DB automatically every 2 days (can be disabled). Central DB can be configured. Each install of Spam Karma can work as a sort of P2P relay in the update process (both fetching updates and publishing its own updated list for others to grab).

Thanks and Acknowledgement
Many, many people have contributed, knowingly or not, to this plugin, with their ideas, code, help, testing, advice and support… I ended up rewriting most of the code I took from these plugins, but it nonetheless gave me a solid base to start with quickly. Thanks guys.

If you encounter any error or misclassification of comments (false positive, undetected spam), please contact me and preferably include the whole comment content, such as it appears in the admin screen (with the Spam Karma debug values).

Any comment or suggestion always welcome…

Filed under: Geek, WordPress