The State of Spam [Karma]

January 30th, 2006 | Filed under WordPress
 

First blog update on Spam Karma, Wordpress development and Spam in many months, and a crucial one at that. Being notoriously verbose to the point of irrelevance, yet with lots to say today, I have tried to provide a telegraphic sum-up below, feel free to skip and go straight to the parts you may care about (hint for the busy ones: the plot thickens mostly around part 6 and 7).

1. How well is SK2 stopping spam currently?

Pretty damn well, thank you.

2. What’s wrong in the peaceful Kingdom of SpamKarmia then?

A new breed of Evil has been summoned and is threatening to breach in.

3. How evil?

Very Evil… and powerful.

4. Won’t anybody show up and save the day?

Doubtful…

5. Is there really nothing you can do?

Of course there is.

6. Then why aren’t you busy doing it, you lazy bastard

Here is why: …

7. You wouldn’t leave us to die here, would you?

Watch me.


And now for the details:

1. How well is SK2 stopping spam currently?

If you’ve been using SK2 for a while until now, you know it’s working pretty damn well. Over the past year, on the different blogs I manage (some of which receive a steady stream of both legit and spam comments, TBs and PBs): over 99% of spam was caught and under 0.1% false positive (pretty much zero, actually).

The only spam comments that made it through, were usually spams posted manually: that is, where a human would browse to the site, maybe even read the post and post a topical comment looking nearly like ham, save for a blatantly “commercial” site linked in the URL field. These were nearly impossible to stop, as SK2 works 90% on detecting spambots and relies only moderately on blacklisting (which helps to keep its false-positive rates extremely low).

These “manual” spams, though, never were much of an issue, as the essence of spam is automation, without which it loses all its appeal: Assuming it takes a few seconds for an admin to manually moderate spam, and given the numbers of bloggers vs. spammers, anything under hundreds of spams per seconds, is just not worth a spammer’s time.

Also one important thing to understand is that SK2 learns and improves: Flagging the spams it let through, helps stopping the next ones. It is fairly normal for a fresh install to let a few spams through at the beginning, but flagging them and thus allowing SK2 to build its blacklists and pattern lists, should immediately improve the catching rate dramatically.

2. Then why have I seen so much spam going through lately?

Unfortunately, as some of you might have noticed, SK2’s performances as seen from the outside, seem to have dropped suddenly over the past few days. While the bulk of the spam still remains at the door, a meaningful percentage now manages to fly right through SK2’s basic filters. And given the numbers involved, even 1% of all spam attempts is a lot to deal with. There again: SK2’s blacklists learn, and conscientiously flagging each uncaught spam should help keep things under control, but this is still a major quality drop from SK2’s usual performance.

The reason for this sudden burst, is a new breed of spam, or more likely, of spambots. It is confirmed now that some spammers have gotten hold of much more efficient spamming tools. Ones that bypass some of SK2’s strongest filters without trouble.

Also of note is the fact that Trackbacks and Pingbacks are absolutely unaffected by this issue (although a small unrelated bug was fixed in the latter SK2.1 releases and you may want to upgrade again from the site: more on this later).

3. How does this new spambot generation work?

This is a very difficult question, since it involves lots of guessing and detective work. Pretty much like in a war, we do not have access to the enemy’s weapons designs. A very uneven war, actually, since the enemy does have access to ours.

There are ways, though, to gather information about what spambots do, and try guessing how they do it.

[long and uselessly detailed technical droning: you probably want to skip that if you aren't an anti-spam plugin developer yourself:]

First of all, these spams do not present most of the idiotic traits of their lower colleagues: they do not try cramming hundreds of URLs or inserting hundreds of easily spotted junk keywords in the comment content. Instead, they use only the dedicated name and homepage fields to sneak in spam URL and keywords. The comment content is often perfectly innocuous, sometimes even topical (by copying parts of another comment or a trackbacking post). All in all, these spams could easily be missed by a human moderator who wouldn’t look carefully at the contact name and URL.

When dissected in the http server logs, the spam looks strikingly human-generated: queries for all the files (pictures, css, favicon and javascript included), sometimes a valid referrer URL is provided, links are followed (e.g. from the frontpage to a specific post), the user-agent, of course is valid and claims to be a regular browser. Timestamps generated by a single spamming IP even seem to point to a typically human erratic way of browsing. Most importantly, the spam bypasses SK2’s Javascript filter, which indicates an ability to parse javascript.

However, looking closer at timestamps and a host of other small details, I am fairly certain these aren’t posted by a human, but are indeed a new breed of spambots. There are many ways I can think of, to make such a spambot with javascript-parsing ability and other “mimicking” skills… In fact, I’m just surprised it hadn’t been done before. But this new development is also worrying, as it seems to indicate that spammers have finally gotten hold of real coders to do the job: whereas previous spambots could have been the work of any random script-kiddies with half a brain and a vague knowledge of scripting, these seem a bit more thought out in their design and their implementation. This is particularly worrying as I do not know of any anti-spam system currently that I, or a somewhat similarly skilled coder (that is: not that incredibly skilled) couldn’t force through eventually.

So far, the overal dumbness of spambot programmers gave anti-spam plugins a very easy edge. Things will change if real coders start taking an interest in this no-doubt very lucrative market and starts churning out efficient spambots program to the spam monkeys. And do not doubt a second there aren’t or won’t be such black hat developers in this market (the same way there are in other domains of internet spam)… Even if Mark Pilgrim was slightly off the mark in his apocalyptic sum-up of the situation, he was certainly right on one point: there is huge money involved, certainly enough to pay the hourly services of a decent professional coder… perhaps even [cue ominous strings on the soundtrack] a coder already involved in the blogging community.

No, not me (unless I’ve been sleepcoding again).

4. Will any other anti-spam tool fare better than SK2 with this particular spam (or spam in general)?

First off, SK2 is hardly out of the game: even as it is, and with a few tweakings, it can easily be brought back to a satisfying, if not perfect, level of protection. Not to mention a possible harder, faster and better successor to SK2 (more on that later).

As for the rest.

You’ll have to believe me when I say I truly wished for a better offer in anti-spam tools. Far from seeing it as some sort of “competition” (to what? a product I am neither selling nor making any revenue off?), I consider diversity in spam-fighting tools the most efficient way to fight spam. The same way bio-diversity is your guarantee against viruses and germs, presenting a wide array of defense tools to spammers means they can less easily focus their attention on one in particular and try to break it.

What we really do not need, however, is yet another blissfully ignorant moron releasing some stupid 5-line, 3-year outdated, kiddie trick that will not fool a single spammer and waste hours of users’ time. Unfortunately there are a lot of these. So let me go through a quick roundup of what worked, works, and never worked, I’ll skip the details for today, so you’ll have to take my word when I say that:

  • Captchas: work. Despite the ultra-theoretical “captcha breaking” scheme urban legend, spammers aren’t about to break a captcha on your blog. The big downside of Captchas, is that they are extremely user-unfriendly, intrusive and most of all: hurt accessibility (how do blind users do?).
  • Pretty much any other plugins won’t work. Blacklists, “spam words”, stupid script renaming tricks and all: all pretty useless taken one by one. Some used to work years ago, all have been successfully broken by spammers. Some are even dangerous by the number of false positives they yield. Just save your time and skip them. Javascript payloads also likely won’t be working (I’d love to hear from anybody currently using such a type of plugin, but I’m pretty sure of this one).
  • Bad Behavior will not stop these specific spammers. For the simple reason that BB is not designed to filter spam. It is only meant to stop the 70% stupid bots that do stupid things. Unfortunately bots are getting smarter, and the ones you wanna worry about are in the top percent of these 30%, thus far out of reach of BB.
  • Akismet works. Roughly with the same result rates as SK2. Possibly a slightly higher catching rate, but also a higher false positive rate (which is a big no-no, in my opinion, but that’s up to you). Other concerns generally thrown around include privacy, reliability and terms of use (it is free, but you are entirely dependent on a third party server). My personal issue is that I am doubtful of the long-term resilience of a monolithic DB such as Akismet’s when confronted to both Denial of Service attempts and data poisoning. There is some breathing room until spammers turn their unbridled attention to these weaknesses, but the fact Akismet is now bundled with WP will only accelerates things.

As you can tell, there is scant little out there, only a few plugins that all fare somewhat on a par with SK2, all with their pros and cons. Most important of all, there is currently nothing I wouldn’t feel confident breaking through, was I to start in the business of spamming tomorrow…

Just wire the amount to my swiss account.

I kid.

6. Is there really nothing you can do?

Of course there is.

I have a very fertile imagination, and still a couple tricks to throw in the way of the spamming monkeys, spanning from small bits of tweaking all the way to major, insane and quite possibly break-through concept ideas. Very few in the middle actually. Problem being of course that the more potentially efficient tools would also tend to be the more time-consuming, hazardous ones.

Let me try to sum up the whole state of Spamdom such as I see it, with a tedious numerical analogy:

Say spam-protection goes from 1 to 100, where 1 is “sitting duck”, and 100 is “so protected that Houdini himself wouldn’t get a spam through”. Now let’s say most anti-spam plugins tend to hit somewhere in the 1-10 range, with a few, such as Akismet or SK2, hitting something like a 20 (perhaps also rising a bit as time and improvements went).
Simultaneously spamming techniques have also been adapting and improving, and it’s fair to say they are now approaching a 20, and steadily rising. Essentially, spammers are lazy (or pragmatic, depends on how you see it) and their target is to be just above the anti-spam barrier, not much higher.
Now, among the anti-spam tricks left in reserve, I’d say I got a few small ones that should without too much effort bump SK2 a few points up (with compounded effect, something like a 25), which is nice, but certainly won’t buy more than a few weeks/months.

Since they are also by far the easiest ones to implement, I am already working on them.

There are two other separate projects I’ve been toying, testing and prototyping with: a first one involving a somewhat novel approach to Naive Bayes filtering (definitely not on comment content), which would be a definite +10 on our SpamScale, and another, considerably more complex and difficult to explain in details, that could be crudely summed up as a P2P Blacklisting system.

That last idea I have been thinking through for a looong time now. I have some confidence that it may hold the key to the End of Blog Spam as We Know It… A definite +50 on our scale…

Of course, these last two ones, are also the ones that will take serious time investments before even figuring if I can do something with them… Which takes us to the one and only question you all care about:

7. Why aren’t you busy working on the next anti-spam solution before this spam thing becomes out of control?

Well, because as I said above, it is a lot of work. Work that would add to the top of the already heavy SK2-related workload I deal with daily. Don’t get me wrong, as I’ve stated previously: I love developing, I love developing SK2 and most of the time I love hearing from you (even if sometimes I get irrepressible urges to ram online manuals down some throats). But being a fully human carbon-based entity with little photosynthesis abilities, I happen to need food near-daily…

Also due to recent life changes, I am now a tad busier (being a full-time student) and much poorer (being a full-time student) than before. Hence the regrettable need I am in, to privilege works that either feed me or keep my university peers and professors content.

Can you tell where this is getting? No? OK:

To make it short, I am launching a Fund Drive

The idea is simple: if you use SK2, if you like it, if you’d like to see more of it in the future, if you’d like this future to be sooner than never, if you’d like to help fund the crack habit of a starving student who also happens to dedicate way too much of his free time to eradicating spam, if you think this is worth a few cents, hell even a few dollars, if you can afford to spend this money without robbing your kid or your cat of their next birthday present… Consider donating:

$2.00

$5.00

$10.00

$20.00

$30.00

$50.00

$666.00

There are currently a few thousands of you actively using SK2 (yep, crazy huh?)… I figure if we weed out the cheapos and those who honestly can’t afford it, plus those who consider their small use of SK2 not worth a monetary contribution (hey, I don’t pay for all my shareware… I’m nobody to throw you the first stone), that might still leave a few dozens of you? If each one contributes a few bucks, that should be enough for me to justify spending a few weeks working on SK3 rather than flipping burgers to pay for booze (and occasionally food and rent).

Non-monetary donations of any sorts are all gladly accepted: food specialties from where you live (especially if it’s distilled and drinkable, but the solid kind is cool too), postcards and anything else that won’t cause a police raid to my place at 6 in the morning… Note that due to recent health regulations, I can no longer accept your first-born child in payment for services, but thanks for offering.

If, like me, you are a starving student who cannot afford to divert any of your drug money to pay for my costly addictions, then consider donating some time. There will be need for it: mostly in doc writing (FAQs, user guide, maybe even a support forum at some point since the whole 2-hours emailing a day is becoming a bit tedious). Just put your name in and my people will get in touch with your people when the time arises.

If making a donation, please provide a nickname (if you don’t want your full name to be used) and your blog’s address, as I will probably make a donation page to list all those (if any) who donated.

8. Would you seriously stop developing SK if you don’t get money?

Of course not.

But it is unfortunately true that I will have to lower my involvement with anti-spam dev in favour of more, err, survival-oriented activities. Obviously, I’d much rather be paid for something I love doing (like squashing spam and spammers) than any random job… But it isn’t much of a choice.

I guess I should set some sort of imaginary milestones in terms of funding and how far/fast it would take me on the SK3 development trail, but I’d rather not look like a complete moron when all but a fraction of it will have trickled in at the end of the month… So I’ll just give you my word that I’ll do my best with what I get, and probably with what I don’t get either…

No matter what happens, I will be releasing SK2.2 (with minor tweaks and bug-fixing) at some point… Hopefully within a week… The two bigger components will honestly depend on how much interest they raise and the time I can afford to spend on them (we are talking at least month-long projects)…

Oh, and let me remind you that donations are not, I repeat: not, mandatory in any way whatsoever.
This is not a change in licensing: SK2 is and will remain free for all non-commercial use and redistribution (note that you can still use SK2 on a commercial blog, the only restriction is on packaging and distributing or otherwise selling SK2 for profit: in which case I ask that you contact me for permission first).

I also wanted to take the occasion to thank very sincerely all those who have already donated money, time or simply kind words through email: you have made my day on many occasions, and helped making it worth it so far.

Thanks a lot and do not hesitate to spread the word!

70 Responses to “The State of Spam [Karma]”

Good luck, Dr. Dave.

- I already sent my donation last week, as I recall. :)
- Hi to Admiral Justin, too. :)

At least you are honest! I take it the Bombay Saphire is fueling your brain cells … so nothing wrong with that in my book! Thinking of how much time and grief SK saved me (well, and the time and grief it cost me back in Fiji) a donation is in order.

SK2 has saved my blog, so it was definitely worth a donation. You guys that develop this software are brilliant. Keep up the good work.

[...] If you do, go to dr Dave’s blog, read all about the “State of Spam” and then click the donate button. As you can see in the footer of this blog, SK2 has killed 5276 spams. If it took me 1 second to kill one of those spams, then dr Dave’s code has saved me 87 minutes of deleting those spams (that doesn’t sound much – but do you really want to spend an hour and a half doing it ?). Adding into that the fact that readers here have not had to see the absolute crap that the spammers churn out then I think a donation is very much in order. Blogging may well be free but we sometimes need to say Thank You to those that help make this experience a more pleasant one. And today is that day. Go click the Donate button, hand over some cash and say Thank You. ¤ Read (1) [...]

[...] Da unser allseits geliebter DrDave gerade wenig Zeit und Geld für die Weiterentwicklung von Spam Karma 2 hat, schützt ab sofort Akismet das CW Notizblog vor Kommentar- und Trackback-Schmutz. [...]

[...] Dave, the author of Spam Karma 2 (SK2), has written an interesting essay on his take on comment spam entitled The State of Spam [Karma]. It’s a good read if you’re into this sort of thing. SK2 was one of the first really advanced scripts put together to combat spam, and you can see on our development page someone has actually created code that combines Akismet with SK2, which sounds pretty groovy to us. [...]

Student/Unemployed person here too, wishing she could donate money but bills make things a little tight! SK2 has helped lift my site workload (bless you) and I know a limited amount of php and nothing about spam (apart from it annoys the crap out of me) but if you need someone to help with the documentation (making it “idiot proof”/”newbie friendly” with proof reading) then I can help with that!

I wish I could use my paypal account :( . Keep up the good work dude .. I love your masterpiece.

I think you could happily have a $30 and $40 button there.. I’d have gone for either. The 666 was maybe a touch over. But yes, SK has definitely saved me at least that much time/money. More to the point, it’s saved me that aggravation.

Good luck with the exams, btw. Bear in mind that the results often don’t mean anything about your future career. Mine didn’t. (Bad exams, good career.)

Thanks for all the details, and I’m a little disheartened to hear that spammers are evaluating JS. Like you, I’m surprised it took this long, and it’s a day I’ve sort of feared, but it was bound to hapen eventually.

I was wondering what you think about non-image CAPTCHAs, like WP-Gatekeeper. Yes, I wrote it, but I’m interested in your honest assessment. I’m an Akismet boy right now, and ironically don’t even use WP-Gatekeeper any more, but others (like The Blog Herald) have recently used it with great success. I’m just wondering if the success is temporary, or if that approach has legs.

Donation gladly given once again Dr. Dave!

keep up the good work, and make sure you spend it on the quality liquor, and not the cheap stuff (quality over quantity, I say ;) )

Looking forward to the next generation of Spam Karma!

[...] Damn comment spam is on the up again from this IP address 195.225.177.80 Thank goodness for Dr Dave’s Sk2 we don’t have to worry to much. Go read The state of spam(karma) that Dr Dave has written. If like me you use Sk2 then give the guy a donation, after all he is our saviour. Thanks for Sk2. [...]

Thanks for this cool piece of software ;-)

[...] Dr. Dave, author of the excellent Spam Karma plugin for WordPress, has posted The State of Spam [Karma] in response to a new breed of spambots. (These sneaky %#@!ers hit this site on Friday, so I installed the 2.2 beta. They seem to have stopped trying over the weekend.) Anyway, Dr. Dave is holding a donation drive to help cover future versions of Spam Karma. I think it’s worth at least a few bucks. [...]

[...] Intéressant article chez dr Dave sur la recrudescence du phénomène des spams sur les blogs au travers des évolutions de son plugin Spam Karma. [...]

[...] Via Spam Karma’s author, Dr. Dave, I found this posting to be a worrying read.   [link] [...]

[...] » Dr Dave gives his State of Spam Karma address. (#) [...]

[...] Here is the link: Michael Hampton’s Blog Here is the link: Akismet: Spam Karma State -John Havlik [...]

I love SK better than Akismet dr dave ;-) . Unfortunately, I, too, am a student so money is pretty tight for me. So in the mean time, I can only help you with a dirt and mortar help such as writing FAQ or attending forums a couple of hours every day (or week?). Just let me know if you need any help.

[...] The State of Spam [Karma] 01/30 18:01, 2006 簡單的說: 1. 有新的 Spambot, 而 SK2 無力招架. 2. 身為全職學生, 沒啥時間推出強而有力的 SK3 3. … 所以各位好心的大爺們賞點錢吧, 2 塊不嫌少, 10 塊不算多, 如果能捐 666 就更讚啦. [...]

Well, not a student, and have used it in my blog (which I only post to once in a while) and I have to say that SK2 has definitely helped. Less crud I have to deal with, and with an honest guy like Dr. Dave (he did say that it funds his taste for Bombay Sapphire) I’m all for donating.

Like your software and your blog…hell, I don’t even know you but I like your style! What kind of distilled beverages can be shipped to Japan? I might send you something too! ;)

Jose

[...] und Spam Karma findet Ihr >>HIER< [...]

I spotted the first weird comments a few days ago. It was the name of the author that made me suspicious and I began to check the link in the authors name. Since then I’ve had 5 or 6 of them, some more stubborn than others, but no big flood of them so far.
Thanks for working on this and I hope the studies are going well too :D

C.

[...] Since I switched it on a while ago Spam Karma has prevented at least 1500 spams from seeing the light of day on this blog alone. There’s a rare slip-up, but Spam Karma has made my life enormously more simple. Now Dr Dave, the creator, has written in The State of Spam [Karma] about how it’s a tough battle, with a sudden increase in spam battering at the gates: The reason for this sudden burst, is a new breed of spam, or more likely, of spambots. It is confirmed now that some spammers have gotten hold of much more efficient spamming tools. Ones that bypass some of SK2’s strongest filters without trouble. [...]

By JavaScript payloads are you talking about what WP Hashcash does, or does it do something above and beyond SK2 in that area?

I use WP Hashcash (only) and have only had manually-typed spam getting through still. I haven’t seen any evidence of a bot breaking through this yet, though I know it’s inevitable. (WPH blocks about 200-300 spam a day on my sites, including a couple of PR7s, so they’re prime targets.)

At any rate, maybe WPH has some ideas you could implement in SK.

Regardless, I’m sending you a donation: SK2 may not be my current plug-in of choice, but I’m willing to fund any anti-comment-spam effort… Best of luck.

Dave: I’m with you. I do my blog purely for love too.

I sent in some dough but didn’t read far enough down in your post to see you asked for blog url & e mails along w. the donation. By all means, make me public & use the info I’ve entered for this comment.

BTW, when I try to lv. this comment using FF 1.5 I can’t see the Submit Comment button. It appears to be covered by the “This entry was posted…” msg. So I had to use IE to post this.

Just donated. Thanks for SK!

[...] “Dr Dave”, the author of the excellent Spam Karma WordPress plugin, has a long but (IMO) fascinating post about how the “war” against comment spam goes. I direct you to The State of Spam [Karma]. [...]

[...] Fixed: I’ve uploaded and installed the most recent release of Spam Karma. All is well now. Happy commenting. (-: [...]

[...] kann sein, daß ich kurzfristig die Kommentar/Trackbackfunktion abschalten muss, die Spammer-Meteoriten Einschläge kommen immer näher, sozusagen. Muss wohl auf SpamKarma 2.x updaten, wie es Dave ja bereits schon beschrieben hat. verwandte Artikel: [...]

DrDave!
I’d like to help with docs, support and general crime fighting aspects of SK3.

Thanks for SK. Just made donation. Cheers!

[...] I also upgraded to WordPress 2.0.1, and also read this lengthy post from the author of the anti-spam plugin I use, Spam Karma. [...]

Hey,

I just sent a donation as well.

I know how you feel buddy as sadly I’m in the same boat.

Luckily this week I could afford a small donation.

I really do appreciate all your hard work.

Take care and sleep tight. ;)

Will

[...] Dr Dave has written a great overview of the current state of Spam Karma, as well as a number of new challenges that he is facing on a few fronts. The net of which is he’s hoping to get a few donations to allow him to continue to combine time spent developing Spam Karma with the time spent keeping a roof over his head, books on the table, and fuel in the belly. [...]

[...] Many of you will know that after a month or two of running this ‘ere blog, I started to receive comment spam, as I’m sure most other bloggers do. Deciding that something had to be done, other than manually removing it myself, I looked around for solutions and eventually settled on Spam Karma. This is a plugin for WordPress blogs and so far I have found it to be very effective. Just scroll down to the bottom of the page to see how many spam comments it has eaten. And a fare few get put into moderation as well, which I don’t believe increment the counter, but that is techie stuff and beyond me, so don’t quote me on that! Either way, SK2 has been working wonderfully (no real comments eaten by SK yet) , and it is good to see that its creator – Dr Dave – is keeping himself abreast of the new and varied tactics employeed by spammers. On his blog, Dr Dave has discussed where he thinks spammers are going and the tactics they are using, and also a little about where he intends to take SK in order to keep ahead of the game. It is a light and brief read, even for a novice like me, and it is always nice to know that something one relies upon is going to be continued. [...]

[...] As many of you are no doubt aware, there’s a new class of automated spambots out there which Bad Behavior and other spam tools don’t yet handle. Spammers have indeed adapted their techniques to get past tools such as Bad Behavior, Spam Karma, and Akismet, and are actually succeeding. I first caught wind of this new generation a few months ago, and began working on Bad Behavior 2, my attempt to deal with the new generation of malicious spambots. [...]

Just recently made a $ contribution and am glad I did. You’re better at, and about, clueing in this noncoding user than just about any coder I’ve come across. Bless you. May whatever gene is responsible get loose and spread throughout the coding-human species.

Seriously, I may be able to kick in a little more after awhile. In the meantime, know tht your efforts and communications are really appreciated. I’ve got a few of the damn things slipping through, but nothing like it was before plugging SK2 in.

Spam Karma & The Changing Face of Comment Spam

I use Spam Karma 2 to stop the comment and trackback spam from reaching both this blog and the IDC blog. it works wonders and is a beautiful piece of software. It’s completely stopped spam to my site and false positives are very, very rare. Well,…

Kampf dem Kommentarspam

Nachdem ich im Kampf gegen den Kommentarspam vor einigen Monaten Spam Karma 2 installiert hatte, konnte ich wieder beruhigt schlafen. Das Plugin arbeitete äusserst zuverlässig, erkannte Spam praktisch immer oder schob den Kommentar bei der geringste…

Hi, I made my small donation but I’d like also to suggest how to contribute not only to you but to all the anti-spammers community around. You wrote that the enemy has access to our weapons.

Anyway, I think that there are software companies around developing spamming tools and selling them to spammers.

It would be nice to have a list of all spam developers companies and check how their software behaves.

Even better it would be nice to fight these companies in different manners, you know what could happen to these companies if there would be a nice list online ;-)

(Do you know what I could do to the texas holdem poker guy if I get my hands on him? ;-) )

I’ll check around but if you, or any reader of this blog has this information, please release it in order to better know the enemy weapons.

Dr Dave, do you mind creating a page or an article (updated) with such a list?

Best regards and keep developing.

Paolo

Thanks Dr.Dave! I can spare a few bucks for the time and trouble you have saved me

[...] I use Spam Karma 2 to help my fight against spam on this Wordpress blog. But lately I’ve noticed a lot of spam getting through. Well, the creator of Spam Karma 2 posted his State of Spam [Karma]: Unfortunately, as some of you might have noticed, SK2’s performances as seen from the outside, seem to have dropped suddenly over the past few days. While the bulk of the spam still remains at the door, a meaningful percentage now manages to fly right through SK2’s basic filters. And given the numbers involved, even 1% of all spam attempts is a lot to deal with…The reason for this sudden burst, is a new breed of spam, or more likely, of spambots. It is confirmed now that some spammers have gotten hold of much more efficient spamming tools. Ones that bypass some of SK2’s strongest filters without trouble. [...]

Woa… that’s a lot of comments.

Sorry for being remiss the whole past week, I was busy bathing in gold and flipping through $100 bills taking care of student life.

So, first of all, a general thank you to all of you. I just posted a few details, for those interested.

Secondly, to all those who contacted me about helping with docs (and left an address): I’ll be sending you an email soon to discuss the improvements that need to be done. Don’t hesitate to bug me (preferably through email) if you don’t receive anything by the end of the week: it means I probably lost your email somewhere…

Now, as for the specifics:

nacken

Indeed, Bombay Sapphire is fueling my brain, it’s also helping to protect me from malaria by guaranteeing my daily quinine intake. It also shuts the voices in my head long enough to let me focus on code.

Charles (and a few others)

Actually, I hadn’t realized that people would be scrupulously following whatever amounts I offered by default: I figured anybody could just then go and use Paypal’s free form directly, or use combination of amounts as many times. Hence the limited number of options I had there (corrected ever since).
As for anybody donating $666, or even $40 for that matter: I think I’d be embarrassed and, to say the truth, slightly suspicious of what it is exactly I am relinquishing for such an amount.

And about the exams: no big worries, even if these were busy times indeed, my being more of a “going back to uni” situation, means I am both fairly relaxed and not overly concerned about the rest of my professional life being impacted by this. But thanks for reminding me :)

Eric

Yea, JS evaluation is fairly basic to do. I can think of many ways (ranging from Greasemonkey to an MFC app using MSIE components) for a spambot to behave exactly like a browser. And expect to see even more of the anti-bot filters becoming increasingly irrelevant.

Regarding WP-Gatekeeper (and similar solutions). I must admit I am not a big fan.

In a nutshell:
- I don’t particularly like the fact it’s tied to language comprehension (sure you can localize it, but…). I would also be worried that quickly increasing difficulty of these “easy” riddles would become an obstacle to some of the less fluent commenters. And if you think any commenter fluent enough to read your blog would be fluent enough to answer that type of question, ask any English-speaking Japanese what color a green apple is: you’ll see what I mean.
Of course, one could ensure this level always remain low enough (by providing a canned dataset and little ways to change it) but then:
- Any riddle-building algorithm based on a limited set of data can be reverse-engineered all the same (say you have thousands of installs with a dozen fruit names and the color associated to them: how long do you think it’d take a bot to work through that).
- More evolved algos, using bigger datasets (and then bringing us back to problem 1) would still be breakable with very basic AI. In fact, I’d be personally much more confident in my ability to break such “Turing riddles” than even the simplest Captchas out there.

So at the end of the day, I can only retain accessibility (minus usability) as an asset over other solutions and Captcha in particular. Knowing that the accessibility card for Captcha is something of a false problem (there are hundreds of ways to work around the issues it creates for sight-impaired users), I would say that the result is not worth the effort in the short term. At any rate, I don’t mean to belittle your work here: bringing in a usability-aware alternative to Captcha is a tough probelm, and one worth studying even if I’m doubtful of the chances…

Jose

Not sure what kind of beverages can safely be shipped to Japan (their customs are notoriously tight), but actually I am no longer residing in beautiful Tokyo at the moment (something to do with recent life changes).

Michael

By JavaScript payloads I am indeed referring to what WP Hashcash (and a few others, including SK2) do. And I am quite positive now that they’ve been broken by some spambots. I would love to hear more data from users of other JS-based plugins, but in the end, I am confident that those who haven’t been broken, will be, as soon as spambots makers turn their attention to them. Also note that, as I wrote in the entry above, these new spams look very strongly like manual spam. I will look into WPH again, but the best I know, its entire stopping power revolves entirely around this JS payload.

Paolo

“Unfortunately”, there are no such things (for comment spam, at least) as companies providing the spammers with service or tools (the one who claim doing so are usually complete rip-offs and the least of our worries). All these tools are developed and sold in shady virtual back-alley and little can be known of what is done there. Of course, boycotting products and sites that use their service is a minimum, but I doubt they really care.

Actually, one thing I forgot to talk about, but that’s been on my roadmap for quite a while now, would be an automated counter-google-bomb that would ensure all the keywords the spammers try to push automatically get “hijacked” back to non-spammer’s site. For example, if all installs of SK2 had a dedicated page linking “viagra” and “texas holdem” to their respective Wikipedia pages. Currently, the only thing stopping me from doing this, is that I would need at least a dozen neutral (non-commercial and consensual) reference sites in order to efficiently bump spam results off the first page of Google.

Anybody’s got suggestions beside the obvious (Wikipedia and Everything2)?

Hey,

From one student to another – Spam Karma is one of the best things about Word Press. In my opinion, without Spam Karm, I wouldn’t even use Word Press. So thanks a millin for making it, keeping it up to date – and keeping it so incredibly painless!

I don’t have much money over, but I’m certainly willing to help write FAQs. Let me know!

Dr. Dave- you’re the man- seriously. Spam Karma is the best kinda Karma there is- and my website would have been a comment-free site a year ago, had it not been for you. My donation will come on Friday, once my paycheck (student stipend- I empathize) clears. The $666.00 is intriguing…I wish I had enough to send you the Number of the Beast. Keep up the great dev work.

Peaceout

I made a donation. You created a superb plugin that works, and I look forward to the future of Spam Karma.

Dr. Dave,
Looks like the update to 2.2 might have temporarily taken care of the problem. I noted the difference between before and after updating with a particularly bothersome url getting through. After the update, it got thrown into Hell. I know this may not hold for long, but for now, that irritating bother has been stopped.

[...] In a recent post the creator of SK2 (Dr. Dave) raised some questions on the long term effectiveness of SK2 with some of the new aggressive advancements being made by spammers. In his lengthy article The State of Spam (Karma), Dr. Dave says that SK2 will need some upgrades to stay effective. But that’s coming from a programmer.:) As business owners marketing with blogs, we haven’t noticed any decrease in Spam Karma’s effectiveness as of yet. [...]

“Most importantly, the spam bypasses SK2’s Javascript filter, which indicates an ability to parse javascript.”

Hi Dr. Dave,
Regarding your comments about spambots mimicking users, I think it’s a fair assumption that most spambots are actually remote controlled browser clients, I have made this kind app for a web-test tool and it really made things simpler for simulation purposes because of JS, CSS+IMG loading, headers etc. are all the real deal.

Aine
Yea, I reckon SK2.2 should temporarily give us back the advantage over spambots. Let’s see how long that will last (I’d say at least a month or two), and at any rate, I’m not planning to sleep on it.
BTW, be careful if you downloaded SK2.2 right after the announcement: it contained a very stupid and nasty bug in its filters. Make sure you are running SK2.2 final rev 2 (can be seen on SK2’s main page).

Fini

Indeed, I mentioned that in the comments above, but basically I am not at all surprised and can think of many ways spambots could be driving browsers… Unfortunately that means all filtering systems relying on “imperfections” of spambots (the way they would miss small details in headers and such) are gonna become irrelevant eventually. Since it was never something I relied on that much, this is not the end of it for now… But we’ll need to work on other areas to catch spambots…

52
Reema Says:

Hello,
Hey now a days really comment spam issue is rising , we need to take a serious step for the same. i m doing a project on filtering this comment spams, for that i require some comment spam samples but i m unable to get it. So could u plz help me by sending them on my mail.
It would be of great help to me.

[...] I’m starting to see some of the new “smart spam” mentioned in this Spam Karma update. Marvelous. It’s slipping right through, although the last one actually went into moderation so maybe SK2 is learning. [...]

[...] I kind of wish everyone could use Spam Karma, by Doctor Dave. That’s my anti-spam solution, and it’s pretty good. Is it 100% perfect? no. But it does get almost all of the spam, with very few false positives. There are a few people who sometimes get caught in the anti-spam net, but it’s a pretty small number. If you have a WordPress blog, I highly recommend the Spam Karma 2 plugin. If you don’t have a wordpress blog, and are still using blogger or typepad, my only question is “why?” « I’m sure there’s a story here somewhere… [...]

55
Anonymous Says:

SK2 sounds very nice. I am contemplating using wordpress here soon to try it out. One tool you might want to look at, if you haven’t already heard about it, is the Pivot Blacklist (http://www.i-marco.nl/wiki/pivot-blacklist) which the Pivot weblog tool (www.pivotlog.net) uses. It uses OSA, HashCash, and a “SillyQuestion” type of quiz. So far, this is the most effective anti-spam tool i have ever seen, and that is with just 1 of the options enabled. I can look at my log and see every spam comment blocked, including the “friendly” spam comments. An example of what I see:

March 23, 2006, 9:20 pm 60.56.229.13 blocked hashcash violation: (Your site is amaizing. Can I share some
March 23, 2006, 9:35 pm 200.242.249.70 blocked hashcash violation: (It looks like you really had a nice time
March 23, 2006, 10:39 pm 68.87.76.148 blocked hashcash violation: (I like your website alot…its lots of f

And it just keeps on going. Check it out though. Maybe there is something from it that you can add, or vice versa. Marco, the author, also makes WP plugins.

Let me jump in, being the author of Pivot Blacklist. HashCash is originally a WP plugin created by Elliott Back. While it still works rather well I believe it’s a dead end. As Dave indicates, bots are getting too smart. If a bot runs the javascript, nothing’s gonna stop it.

What’s still quite effective though is the ’spam quiz’ idea. A bot will have a seriously hard time answering trivial questions, especially if each blog uses a different one. It defeats all spam except for manual spam of course. If your blog happens to be in a non-english language the protection is even better because manual spammers will need to understand the language the question was written in ;)

I must admit though, I haven’t seen much of the really smart bots yet. My own blog doesn’t have any protection enabled at all at the moment. I use a custom AJAX comment submission scheme. There’s no form action to be scraped. Instead the form submission is hidden inside the javascript. A bot which executes the javascript would be able to beat this but… so far this has never happened on my weblog. When it does I’ll need to throw in the spam quiz thing.

If you combine the spamquiz thing with a cookie to remember the answer I think it’s as unobtrusive as possible. Unlike captcha’s, blind people can use it and it can’t be beaten with any script (except for hardcore AI maybe). It’s a ridiculously lo-fi solution compared to SK2, Akismet, Hashcash or AJAX stuff but it works INCREDIBLY well.

[...] And then I encountered some more reading about spam, this time a post by Dr Dave, the creator of the Spam Karma anti-spam plugin for WordPress, entitled The State of Spam [Karma]. Dr Dave has some interesting commentary on the overall picture re blog spam, especially a tech-ish analysis of spam bots and what they do. [...]

[...] Wie der geneigte Leser wei

Great work! Its a amazing plugin and I wish you luck and staying power for the future!

60
Vince Says:

“Captchas: work. Despite the ultra-theoretical “captcha breaking” scheme urban legend, spammers aren’t about to break a captcha on your blog. The big downside of Captchas, is that they are extremely user-unfriendly, intrusive and most of all: hurt accessibility (how do blind users do?).”

How do blind users do what? some Captcha algorithms also offer to make the chars being spoken through a wave-file generation (click on the picture to hear the char-sequence).
Using a little algorithm to add random patterns of noise to the audio so the spoken char is still audible enough to be recognised by the human ear, but fooling fourier algorithms to analyse the audio and extract the character from it.

Blocking Ip’s is another thing, i know there are a lot of open proxy servers, why can’t they be blacklisted using a similar globally deployed detection protocol like being used for open relay mail-servers?

There are various sites that publish open (or anonymous) proxy servers, 2 hints:
http://tools.rosinstrument.com/proxy/
http://www.atomintersoft.com/products/alive-proxy/proxy-list/

You can make use of those sites by updating your ip-blocking list.
I know it’s not friendly to block proxy ip’s, but i call it stupid of srever- hosts to make their server wide open in the first place.

To prevent nagging from victims of a false positive, yield expressions and accusations but rather inform a user that he/she is denied submission because one of the situations is applicable like:
-User uses an IP from an open proxy server that is blacklisted
-User uses a (dynamic) IP that has been registered earlier for spam origination.
-Submitted urls are listed as spam-sites or contain unrelevant information to the contents of this site
-Too many spam-related keywords detected used in the submitted text.

It sounds formal yet will bring a lot more understanding than shouting an accusing phrase (”You are a dirty spammer!” or whatever similar a legitimate poster is being confronted with).

Regards,

Vince.

Thanks champ – I was about to go insane after about a week of ridiculously high and increasing levels of blog spam (besides the normal email spam I get in my 10+ mailboxes) … so you’ve returned sanity to at least one part of my life.

Sorry for the stingy donation – but we can’t have millionaire Wordpress plugin developers now, can we :-)

Thank you, thank you, thank you. Seriously. I know I didn’t donate much cause, well… I don’t have much money. I do, however, write documentation at work for web applications and stuff, so if you need help with documentation or FAQ’s have your people call my people and uh… yeah.

63
John T Says:

Hi. I use email as a primary communication tool, and I am a low capability computer user. Is your program appropriate for me, or is a blog site a different kind of application. I would appreciate any help or advice you would have time to share, and I would be happy to donate/pay. I am a medical researcher, but receive spam from a multitude of sources. Thank you for any advice or help. John Tarvin

Thanks for this great plugin. I plan to switch to it from Akismet (no clue how the third party server it relies on will remain fast and last in the long term).

How can I set SK2 to NOT override my own moderation??

Thanks for helping us keep up, in the anti-spam arms race.

I have noticed a fair number of people (or robots) finding my site by searching for the SK2 “spams eaten” footer. Perhaps those are bots targetting SK2 protected sites specifically.

SK2 is the best spam killer- but is there a way to get it to work together with AJAX comments?
http://www.mikesmullin.com/2006/06/05/ajax-comments-20/.

When both plugins are on, strange things seem to happen.

Anti-Spam is a joke to be honest. I hate spam but please let me tell you a story.

I created an application to send text messages using http://www.vodafone.ie. Now this you say is no impressive task but let me continue.

Their system is setup much like an anti spam system. It checks the time it took you to input a text, the time it took you to log on, the time it took you to take a cup of tea and everything else it can.

Wanna know how I got around it? Simple… I created an application which basically mimics a user. First it opens http://www.vodafone.ie, it then waits 4 seconds and then inputs the user and password and clicks submit. It then waits a couple of seconds and clicks “SMS Messages” and then allows you to enter as many characters as you want. Their version only allows 160 so what mine does is it splits the text into 154 length chunks and adds a … to the end and beginning of each text. It waits about 5 seconds between each text to make the server think it’s an actual person typing it in. And know what? I know for a fact there is absolutely no way to tell the difference beteween it and an actual person without causing more harm than good.

I know it’s not pretty but face facts o.O

Replies to email only.

[...] author of Spam Karma 2 (SK2), has written an interesting essay on his take on comment spam entitled The State of Spam [Karma]. It’s a good read if you’re into this sort of thing. SK2 was one of the first really [...]

Hi,
I love SK better than Akismet dr dave ;-) . Unfortunately, I, too, am a student so money is pretty tight for me. So in the mean time, I can only help you with a dirt and mortar help such as writing FAQ or attending forums a couple of hours every day (or week?). Just let me know if you need any help.
thanks.