The State of Spam [Karma]

First blog update on Spam Karma, WordPress development and Spam in many months, and a crucial one at that. Being notoriously verbose to the point of irrelevance, yet with lots to say today, I have tried to provide a telegraphic sum-up below, feel free to skip and go straight to the parts you may care about (hint for the busy ones: the plot thickens mostly around part 6 and 7).

1. How well is SK2 stopping spam currently?

Pretty damn well, thank you.

2. What’s wrong in the peaceful Kingdom of SpamKarmia then?

A new breed of Evil has been summoned and is threatening to breach in.

3. How evil?

Very Evil… and powerful.

4. Won’t anybody show up and save the day?

Doubtful…

5. Is there really nothing you can do?

Of course there is.

6. Then why aren’t you busy doing it, you lazy bastard

Here is why: …

7. You wouldn’t leave us to die here, would you?

Watch me.


And now for the details:

1. How well is SK2 stopping spam currently?

If you’ve been using SK2 for a while until now, you know it’s working pretty damn well. Over the past year, on the different blogs I manage (some of which receive a steady stream of both legit and spam comments, TBs and PBs): over 99% of spam was caught and under 0.1% false positive (pretty much zero, actually).

The only spam comments that made it through, were usually spams posted manually: that is, where a human would browse to the site, maybe even read the post and post a topical comment looking nearly like ham, save for a blatantly “commercial” site linked in the URL field. These were nearly impossible to stop, as SK2 works 90% on detecting spambots and relies only moderately on blacklisting (which helps to keep its false-positive rates extremely low).

These “manual” spams, though, never were much of an issue, as the essence of spam is automation, without which it loses all its appeal: Assuming it takes a few seconds for an admin to manually moderate spam, and given the numbers of bloggers vs. spammers, anything under hundreds of spams per seconds, is just not worth a spammer’s time.

Also one important thing to understand is that SK2 learns and improves: Flagging the spams it let through, helps stopping the next ones. It is fairly normal for a fresh install to let a few spams through at the beginning, but flagging them and thus allowing SK2 to build its blacklists and pattern lists, should immediately improve the catching rate dramatically.

2. Then why have I seen so much spam going through lately?

Unfortunately, as some of you might have noticed, SK2’s performances as seen from the outside, seem to have dropped suddenly over the past few days. While the bulk of the spam still remains at the door, a meaningful percentage now manages to fly right through SK2’s basic filters. And given the numbers involved, even 1% of all spam attempts is a lot to deal with. There again: SK2’s blacklists learn, and conscientiously flagging each uncaught spam should help keep things under control, but this is still a major quality drop from SK2’s usual performance.

The reason for this sudden burst, is a new breed of spam, or more likely, of spambots. It is confirmed now that some spammers have gotten hold of much more efficient spamming tools. Ones that bypass some of SK2’s strongest filters without trouble.

Also of note is the fact that Trackbacks and Pingbacks are absolutely unaffected by this issue (although a small unrelated bug was fixed in the latter SK2.1 releases and you may want to upgrade again from the site: more on this later).

3. How does this new spambot generation work?

This is a very difficult question, since it involves lots of guessing and detective work. Pretty much like in a war, we do not have access to the enemy’s weapons designs. A very uneven war, actually, since the enemy does have access to ours.

There are ways, though, to gather information about what spambots do, and try guessing how they do it.

[long and uselessly detailed technical droning: you probably want to skip that if you aren’t an anti-spam plugin developer yourself:]

First of all, these spams do not present most of the idiotic traits of their lower colleagues: they do not try cramming hundreds of URLs or inserting hundreds of easily spotted junk keywords in the comment content. Instead, they use only the dedicated name and homepage fields to sneak in spam URL and keywords. The comment content is often perfectly innocuous, sometimes even topical (by copying parts of another comment or a trackbacking post). All in all, these spams could easily be missed by a human moderator who wouldn’t look carefully at the contact name and URL.

When dissected in the http server logs, the spam looks strikingly human-generated: queries for all the files (pictures, css, favicon and javascript included), sometimes a valid referrer URL is provided, links are followed (e.g. from the frontpage to a specific post), the user-agent, of course is valid and claims to be a regular browser. Timestamps generated by a single spamming IP even seem to point to a typically human erratic way of browsing. Most importantly, the spam bypasses SK2’s Javascript filter, which indicates an ability to parse javascript.

However, looking closer at timestamps and a host of other small details, I am fairly certain these aren’t posted by a human, but are indeed a new breed of spambots. There are many ways I can think of, to make such a spambot with javascript-parsing ability and other “mimicking” skills… In fact, I’m just surprised it hadn’t been done before. But this new development is also worrying, as it seems to indicate that spammers have finally gotten hold of real coders to do the job: whereas previous spambots could have been the work of any random script-kiddies with half a brain and a vague knowledge of scripting, these seem a bit more thought out in their design and their implementation. This is particularly worrying as I do not know of any anti-spam system currently that I, or a somewhat similarly skilled coder (that is: not that incredibly skilled) couldn’t force through eventually.

So far, the overal dumbness of spambot programmers gave anti-spam plugins a very easy edge. Things will change if real coders start taking an interest in this no-doubt very lucrative market and starts churning out efficient spambots program to the spam monkeys. And do not doubt a second there aren’t or won’t be such black hat developers in this market (the same way there are in other domains of internet spam)… Even if Mark Pilgrim was slightly off the mark in his apocalyptic sum-up of the situation, he was certainly right on one point: there is huge money involved, certainly enough to pay the hourly services of a decent professional coder… perhaps even [cue ominous strings on the soundtrack] a coder already involved in the blogging community.

No, not me (unless I’ve been sleepcoding again).

4. Will any other anti-spam tool fare better than SK2 with this particular spam (or spam in general)?

First off, SK2 is hardly out of the game: even as it is, and with a few tweakings, it can easily be brought back to a satisfying, if not perfect, level of protection. Not to mention a possible harder, faster and better successor to SK2 (more on that later).

As for the rest.

You’ll have to believe me when I say I truly wished for a better offer in anti-spam tools. Far from seeing it as some sort of “competition” (to what? a product I am neither selling nor making any revenue off?), I consider diversity in spam-fighting tools the most efficient way to fight spam. The same way bio-diversity is your guarantee against viruses and germs, presenting a wide array of defense tools to spammers means they can less easily focus their attention on one in particular and try to break it.

What we really do not need, however, is yet another blissfully ignorant moron releasing some stupid 5-line, 3-year outdated, kiddie trick that will not fool a single spammer and waste hours of users’ time. Unfortunately there are a lot of these. So let me go through a quick roundup of what worked, works, and never worked, I’ll skip the details for today, so you’ll have to take my word when I say that:

  • Captchas: work. Despite the ultra-theoretical “captcha breaking” scheme urban legend, spammers aren’t about to break a captcha on your blog. The big downside of Captchas, is that they are extremely user-unfriendly, intrusive and most of all: hurt accessibility (how do blind users do?).
  • Pretty much any other plugins won’t work. Blacklists, “spam words”, stupid script renaming tricks and all: all pretty useless taken one by one. Some used to work years ago, all have been successfully broken by spammers. Some are even dangerous by the number of false positives they yield. Just save your time and skip them. Javascript payloads also likely won’t be working (I’d love to hear from anybody currently using such a type of plugin, but I’m pretty sure of this one).
  • Bad Behavior will not stop these specific spammers. For the simple reason that BB is not designed to filter spam. It is only meant to stop the 70% stupid bots that do stupid things. Unfortunately bots are getting smarter, and the ones you wanna worry about are in the top percent of these 30%, thus far out of reach of BB.
  • Akismet works. Roughly with the same result rates as SK2. Possibly a slightly higher catching rate, but also a higher false positive rate (which is a big no-no, in my opinion, but that’s up to you). Other concerns generally thrown around include privacy, reliability and terms of use (it is free, but you are entirely dependent on a third party server). My personal issue is that I am doubtful of the long-term resilience of a monolithic DB such as Akismet’s when confronted to both Denial of Service attempts and data poisoning. There is some breathing room until spammers turn their unbridled attention to these weaknesses, but the fact Akismet is now bundled with WP will only accelerates things.

As you can tell, there is scant little out there, only a few plugins that all fare somewhat on a par with SK2, all with their pros and cons. Most important of all, there is currently nothing I wouldn’t feel confident breaking through, was I to start in the business of spamming tomorrow…

Just wire the amount to my swiss account.

I kid.

6. Is there really nothing you can do?

Of course there is.

I have a very fertile imagination, and still a couple tricks to throw in the way of the spamming monkeys, spanning from small bits of tweaking all the way to major, insane and quite possibly break-through concept ideas. Very few in the middle actually. Problem being of course that the more potentially efficient tools would also tend to be the more time-consuming, hazardous ones.

Let me try to sum up the whole state of Spamdom such as I see it, with a tedious numerical analogy:

Say spam-protection goes from 1 to 100, where 1 is “sitting duck”, and 100 is “so protected that Houdini himself wouldn’t get a spam through”. Now let’s say most anti-spam plugins tend to hit somewhere in the 1-10 range, with a few, such as Akismet or SK2, hitting something like a 20 (perhaps also rising a bit as time and improvements went).
Simultaneously spamming techniques have also been adapting and improving, and it’s fair to say they are now approaching a 20, and steadily rising. Essentially, spammers are lazy (or pragmatic, depends on how you see it) and their target is to be just above the anti-spam barrier, not much higher.
Now, among the anti-spam tricks left in reserve, I’d say I got a few small ones that should without too much effort bump SK2 a few points up (with compounded effect, something like a 25), which is nice, but certainly won’t buy more than a few weeks/months.

Since they are also by far the easiest ones to implement, I am already working on them.

There are two other separate projects I’ve been toying, testing and prototyping with: a first one involving a somewhat novel approach to Naive Bayes filtering (definitely not on comment content), which would be a definite +10 on our SpamScale, and another, considerably more complex and difficult to explain in details, that could be crudely summed up as a P2P Blacklisting system.

That last idea I have been thinking through for a looong time now. I have some confidence that it may hold the key to the End of Blog Spam as We Know It… A definite +50 on our scale…

Of course, these last two ones, are also the ones that will take serious time investments before even figuring if I can do something with them… Which takes us to the one and only question you all care about:

7. Why aren’t you busy working on the next anti-spam solution before this spam thing becomes out of control?

Well, because as I said above, it is a lot of work. Work that would add to the top of the already heavy SK2-related workload I deal with daily. Don’t get me wrong, as I’ve stated previously: I love developing, I love developing SK2 and most of the time I love hearing from you (even if sometimes I get irrepressible urges to ram online manuals down some throats). But being a fully human carbon-based entity with little photosynthesis abilities, I happen to need food near-daily…

Also due to recent life changes, I am now a tad busier (being a full-time student) and much poorer (being a full-time student) than before. Hence the regrettable need I am in, to privilege works that either feed me or keep my university peers and professors content.

Can you tell where this is getting? No? OK:

To make it short, I am launching a Fund Drive

The idea is simple: if you use SK2, if you like it, if you’d like to see more of it in the future, if you’d like this future to be sooner than never, if you’d like to help fund the crack habit of a starving student who also happens to dedicate way too much of his free time to eradicating spam, if you think this is worth a few cents, hell even a few dollars, if you can afford to spend this money without robbing your kid or your cat of their next birthday present… Consider donating:

$2.00
$5.00
$10.00
$20.00
$30.00
$50.00
$666.00

There are currently a few thousands of you actively using SK2 (yep, crazy huh?)… I figure if we weed out the cheapos and those who honestly can’t afford it, plus those who consider their small use of SK2 not worth a monetary contribution (hey, I don’t pay for all my shareware… I’m nobody to throw you the first stone), that might still leave a few dozens of you? If each one contributes a few bucks, that should be enough for me to justify spending a few weeks working on SK3 rather than flipping burgers to pay for booze (and occasionally food and rent).

Non-monetary donations of any sorts are all gladly accepted: food specialties from where you live (especially if it’s distilled and drinkable, but the solid kind is cool too), postcards and anything else that won’t cause a police raid to my place at 6 in the morning… Note that due to recent health regulations, I can no longer accept your first-born child in payment for services, but thanks for offering.

If, like me, you are a starving student who cannot afford to divert any of your drug money to pay for my costly addictions, then consider donating some time. There will be need for it: mostly in doc writing (FAQs, user guide, maybe even a support forum at some point since the whole 2-hours emailing a day is becoming a bit tedious). Just put your name in and my people will get in touch with your people when the time arises.

If making a donation, please provide a nickname (if you don’t want your full name to be used) and your blog’s address, as I will probably make a donation page to list all those (if any) who donated.

8. Would you seriously stop developing SK if you don’t get money?

Of course not.

But it is unfortunately true that I will have to lower my involvement with anti-spam dev in favour of more, err, survival-oriented activities. Obviously, I’d much rather be paid for something I love doing (like squashing spam and spammers) than any random job… But it isn’t much of a choice.

I guess I should set some sort of imaginary milestones in terms of funding and how far/fast it would take me on the SK3 development trail, but I’d rather not look like a complete moron when all but a fraction of it will have trickled in at the end of the month… So I’ll just give you my word that I’ll do my best with what I get, and probably with what I don’t get either…

No matter what happens, I will be releasing SK2.2 (with minor tweaks and bug-fixing) at some point… Hopefully within a week… The two bigger components will honestly depend on how much interest they raise and the time I can afford to spend on them (we are talking at least month-long projects)…

Oh, and let me remind you that donations are not, I repeat: not, mandatory in any way whatsoever.
This is not a change in licensing: SK2 is and will remain free for all non-commercial use and redistribution (note that you can still use SK2 on a commercial blog, the only restriction is on packaging and distributing or otherwise selling SK2 for profit: in which case I ask that you contact me for permission first).

I also wanted to take the occasion to thank very sincerely all those who have already donated money, time or simply kind words through email: you have made my day on many occasions, and helped making it worth it so far.

Thanks a lot and do not hesitate to spread the word!