First blog update on Spam Karma, WordPress development and Spam in many months, and a crucial one at that. Being notoriously verbose to the point of irrelevance, yet with lots to say today, I have tried to provide a telegraphic sum-up below, feel free to skip and go straight to the parts you may care about (hint for the busy ones: the plot thickens mostly around part 6 and 7).
1. How well is SK2 stopping spam currently?
2. What’s wrong in the peaceful Kingdom of SpamKarmia then?
A new breed of Evil has been summoned and is threatening to breach in.
3. How evil?
4. Won’t anybody show up and save the day?
5. Is there really nothing you can do?
6. Then why aren’t you busy doing it, you lazy bastard
7. You wouldn’t leave us to die here, would you?
And now for the details:
1. How well is SK2 stopping spam currently?
If you’ve been using SK2 for a while until now, you know it’s working pretty damn well. Over the past year, on the different blogs I manage (some of which receive a steady stream of both legit and spam comments, TBs and PBs): over 99% of spam was caught and under 0.1% false positive (pretty much zero, actually).
The only spam comments that made it through, were usually spams posted manually: that is, where a human would browse to the site, maybe even read the post and post a topical comment looking nearly like ham, save for a blatantly “commercial” site linked in the URL field. These were nearly impossible to stop, as SK2 works 90% on detecting spambots and relies only moderately on blacklisting (which helps to keep its false-positive rates extremely low).
These “manual” spams, though, never were much of an issue, as the essence of spam is automation, without which it loses all its appeal: Assuming it takes a few seconds for an admin to manually moderate spam, and given the numbers of bloggers vs. spammers, anything under hundreds of spams per seconds, is just not worth a spammer’s time.
Also one important thing to understand is that SK2 learns and improves: Flagging the spams it let through, helps stopping the next ones. It is fairly normal for a fresh install to let a few spams through at the beginning, but flagging them and thus allowing SK2 to build its blacklists and pattern lists, should immediately improve the catching rate dramatically.
2. Then why have I seen so much spam going through lately?
Unfortunately, as some of you might have noticed, SK2’s performances as seen from the outside, seem to have dropped suddenly over the past few days. While the bulk of the spam still remains at the door, a meaningful percentage now manages to fly right through SK2’s basic filters. And given the numbers involved, even 1% of all spam attempts is a lot to deal with. There again: SK2’s blacklists learn, and conscientiously flagging each uncaught spam should help keep things under control, but this is still a major quality drop from SK2’s usual performance.
The reason for this sudden burst, is a new breed of spam, or more likely, of spambots. It is confirmed now that some spammers have gotten hold of much more efficient spamming tools. Ones that bypass some of SK2’s strongest filters without trouble.
Also of note is the fact that Trackbacks and Pingbacks are absolutely unaffected by this issue (although a small unrelated bug was fixed in the latter SK2.1 releases and you may want to upgrade again from the site: more on this later).
3. How does this new spambot generation work?
This is a very difficult question, since it involves lots of guessing and detective work. Pretty much like in a war, we do not have access to the enemy’s weapons designs. A very uneven war, actually, since the enemy does have access to ours.
There are ways, though, to gather information about what spambots do, and try guessing how they do it.
[long and uselessly detailed technical droning: you probably want to skip that if you aren’t an anti-spam plugin developer yourself:]
First of all, these spams do not present most of the idiotic traits of their lower colleagues: they do not try cramming hundreds of URLs or inserting hundreds of easily spotted junk keywords in the comment content. Instead, they use only the dedicated name and homepage fields to sneak in spam URL and keywords. The comment content is often perfectly innocuous, sometimes even topical (by copying parts of another comment or a trackbacking post). All in all, these spams could easily be missed by a human moderator who wouldn’t look carefully at the contact name and URL.
So far, the overal dumbness of spambot programmers gave anti-spam plugins a very easy edge. Things will change if real coders start taking an interest in this no-doubt very lucrative market and starts churning out efficient spambots program to the spam monkeys. And do not doubt a second there aren’t or won’t be such black hat developers in this market (the same way there are in other domains of internet spam)… Even if Mark Pilgrim was slightly off the mark in his apocalyptic sum-up of the situation, he was certainly right on one point: there is huge money involved, certainly enough to pay the hourly services of a decent professional coder… perhaps even [cue ominous strings on the soundtrack] a coder already involved in the blogging community.
No, not me (unless I’ve been sleepcoding again).
4. Will any other anti-spam tool fare better than SK2 with this particular spam (or spam in general)?
First off, SK2 is hardly out of the game: even as it is, and with a few tweakings, it can easily be brought back to a satisfying, if not perfect, level of protection. Not to mention a possible harder, faster and better successor to SK2 (more on that later).
As for the rest.
You’ll have to believe me when I say I truly wished for a better offer in anti-spam tools. Far from seeing it as some sort of “competition” (to what? a product I am neither selling nor making any revenue off?), I consider diversity in spam-fighting tools the most efficient way to fight spam. The same way bio-diversity is your guarantee against viruses and germs, presenting a wide array of defense tools to spammers means they can less easily focus their attention on one in particular and try to break it.
What we really do not need, however, is yet another blissfully ignorant moron releasing some stupid 5-line, 3-year outdated, kiddie trick that will not fool a single spammer and waste hours of users’ time. Unfortunately there are a lot of these. So let me go through a quick roundup of what worked, works, and never worked, I’ll skip the details for today, so you’ll have to take my word when I say that:
- Captchas: work. Despite the ultra-theoretical “captcha breaking” scheme urban legend, spammers aren’t about to break a captcha on your blog. The big downside of Captchas, is that they are extremely user-unfriendly, intrusive and most of all: hurt accessibility (how do blind users do?).
- Bad Behavior will not stop these specific spammers. For the simple reason that BB is not designed to filter spam. It is only meant to stop the 70% stupid bots that do stupid things. Unfortunately bots are getting smarter, and the ones you wanna worry about are in the top percent of these 30%, thus far out of reach of BB.
As you can tell, there is scant little out there, only a few plugins that all fare somewhat on a par with SK2, all with their pros and cons. Most important of all, there is currently nothing I wouldn’t feel confident breaking through, was I to start in the business of spamming tomorrow…
Just wire the amount to my swiss account.
6. Is there really nothing you can do?
Of course there is.
I have a very fertile imagination, and still a couple tricks to throw in the way of the spamming monkeys, spanning from small bits of tweaking all the way to major, insane and quite possibly break-through concept ideas. Very few in the middle actually. Problem being of course that the more potentially efficient tools would also tend to be the more time-consuming, hazardous ones.
Let me try to sum up the whole state of Spamdom such as I see it, with a tedious numerical analogy:
Say spam-protection goes from 1 to 100, where 1 is “sitting duck”, and 100 is “so protected that Houdini himself wouldn’t get a spam through”. Now let’s say most anti-spam plugins tend to hit somewhere in the 1-10 range, with a few, such as Akismet or SK2, hitting something like a 20 (perhaps also rising a bit as time and improvements went).
Simultaneously spamming techniques have also been adapting and improving, and it’s fair to say they are now approaching a 20, and steadily rising. Essentially, spammers are lazy (or pragmatic, depends on how you see it) and their target is to be just above the anti-spam barrier, not much higher.
Now, among the anti-spam tricks left in reserve, I’d say I got a few small ones that should without too much effort bump SK2 a few points up (with compounded effect, something like a 25), which is nice, but certainly won’t buy more than a few weeks/months.
Since they are also by far the easiest ones to implement, I am already working on them.
There are two other separate projects I’ve been toying, testing and prototyping with: a first one involving a somewhat novel approach to Naive Bayes filtering (definitely not on comment content), which would be a definite +10 on our SpamScale, and another, considerably more complex and difficult to explain in details, that could be crudely summed up as a P2P Blacklisting system.
That last idea I have been thinking through for a looong time now. I have some confidence that it may hold the key to the End of Blog Spam as We Know It… A definite +50 on our scale…
Of course, these last two ones, are also the ones that will take serious time investments before even figuring if I can do something with them… Which takes us to the one and only question you all care about:
7. Why aren’t you busy working on the next anti-spam solution before this spam thing becomes out of control?
Well, because as I said above, it is a lot of work. Work that would add to the top of the already heavy SK2-related workload I deal with daily. Don’t get me wrong, as I’ve stated previously: I love developing, I love developing SK2 and most of the time I love hearing from you (even if sometimes I get irrepressible urges to ram online manuals down some throats). But being a fully human carbon-based entity with little photosynthesis abilities, I happen to need food near-daily…
Also due to recent life changes, I am now a tad busier (being a full-time student) and much poorer (being a full-time student) than before. Hence the regrettable need I am in, to privilege works that either feed me or keep my university peers and professors content.
Can you tell where this is getting? No? OK:
To make it short, I am launching a Fund Drive…
The idea is simple: if you use SK2, if you like it, if you’d like to see more of it in the future, if you’d like this future to be sooner than never, if you’d like to help fund the crack habit of a starving student who also happens to dedicate way too much of his free time to eradicating spam, if you think this is worth a few cents, hell even a few dollars, if you can afford to spend this money without robbing your kid or your cat of their next birthday present… Consider donating:
There are currently a few thousands of you actively using SK2 (yep, crazy huh?)… I figure if we weed out the cheapos and those who honestly can’t afford it, plus those who consider their small use of SK2 not worth a monetary contribution (hey, I don’t pay for all my shareware… I’m nobody to throw you the first stone), that might still leave a few dozens of you? If each one contributes a few bucks, that should be enough for me to justify spending a few weeks working on SK3 rather than flipping burgers to pay for booze (and occasionally food and rent).
Non-monetary donations of any sorts are all gladly accepted: food specialties from where you live (especially if it’s distilled and drinkable, but the solid kind is cool too), postcards and anything else that won’t cause a police raid to my place at 6 in the morning… Note that due to recent health regulations, I can no longer accept your first-born child in payment for services, but thanks for offering.
If, like me, you are a starving student who cannot afford to divert any of your drug money to pay for my costly addictions, then consider donating some time. There will be need for it: mostly in doc writing (FAQs, user guide, maybe even a support forum at some point since the whole 2-hours emailing a day is becoming a bit tedious). Just put your name in and my people will get in touch with your people when the time arises.
If making a donation, please provide a nickname (if you don’t want your full name to be used) and your blog’s address, as I will probably make a donation page to list all those (if any) who donated.
8. Would you seriously stop developing SK if you don’t get money?
Of course not.
But it is unfortunately true that I will have to lower my involvement with anti-spam dev in favour of more, err, survival-oriented activities. Obviously, I’d much rather be paid for something I love doing (like squashing spam and spammers) than any random job… But it isn’t much of a choice.
I guess I should set some sort of imaginary milestones in terms of funding and how far/fast it would take me on the SK3 development trail, but I’d rather not look like a complete moron when all but a fraction of it will have trickled in at the end of the month… So I’ll just give you my word that I’ll do my best with what I get, and probably with what I don’t get either…
No matter what happens, I will be releasing SK2.2 (with minor tweaks and bug-fixing) at some point… Hopefully within a week… The two bigger components will honestly depend on how much interest they raise and the time I can afford to spend on them (we are talking at least month-long projects)…
Oh, and let me remind you that donations are not, I repeat: not, mandatory in any way whatsoever.
This is not a change in licensing: SK2 is and will remain free for all non-commercial use and redistribution (note that you can still use SK2 on a commercial blog, the only restriction is on packaging and distributing or otherwise selling SK2 for profit: in which case I ask that you contact me for permission first).
I also wanted to take the occasion to thank very sincerely all those who have already donated money, time or simply kind words through email: you have made my day on many occasions, and helped making it worth it so far.
Thanks a lot and do not hesitate to spread the word!
Good luck, Dr. Dave.
– I already sent my donation last week, as I recall. 🙂
– Hi to Admiral Justin, too. 🙂
At least you are honest! I take it the Bombay Saphire is fueling your brain cells … so nothing wrong with that in my book! Thinking of how much time and grief SK saved me (well, and the time and grief it cost me back in Fiji) a donation is in order.
SK2 has saved my blog, so it was definitely worth a donation. You guys that develop this software are brilliant. Keep up the good work.
Student/Unemployed person here too, wishing she could donate money but bills make things a little tight! SK2 has helped lift my site workload (bless you) and I know a limited amount of php and nothing about spam (apart from it annoys the crap out of me) but if you need someone to help with the documentation (making it “idiot proof”/”newbie friendly” with proof reading) then I can help with that!
I wish I could use my paypal account :(. Keep up the good work dude .. I love your masterpiece.
I think you could happily have a $30 and $40 button there.. I’d have gone for either. The 666 was maybe a touch over. But yes, SK has definitely saved me at least that much time/money. More to the point, it’s saved me that aggravation.
Good luck with the exams, btw. Bear in mind that the results often don’t mean anything about your future career. Mine didn’t. (Bad exams, good career.)
Thanks for all the details, and I’m a little disheartened to hear that spammers are evaluating JS. Like you, I’m surprised it took this long, and it’s a day I’ve sort of feared, but it was bound to hapen eventually.
I was wondering what you think about non-image CAPTCHAs, like WP-Gatekeeper. Yes, I wrote it, but I’m interested in your honest assessment. I’m an Akismet boy right now, and ironically don’t even use WP-Gatekeeper any more, but others (like The Blog Herald) have recently used it with great success. I’m just wondering if the success is temporary, or if that approach has legs.
Donation gladly given once again Dr. Dave!
keep up the good work, and make sure you spend it on the quality liquor, and not the cheap stuff (quality over quantity, I say ;))
Looking forward to the next generation of Spam Karma!
Thanks for this cool piece of software 😉
I love SK better than Akismet dr dave ;-). Unfortunately, I, too, am a student so money is pretty tight for me. So in the mean time, I can only help you with a dirt and mortar help such as writing FAQ or attending forums a couple of hours every day (or week?). Just let me know if you need any help.
Well, not a student, and have used it in my blog (which I only post to once in a while) and I have to say that SK2 has definitely helped. Less crud I have to deal with, and with an honest guy like Dr. Dave (he did say that it funds his taste for Bombay Sapphire) I’m all for donating.
Like your software and your blog…hell, I don’t even know you but I like your style! What kind of distilled beverages can be shipped to Japan? I might send you something too! 😉
I spotted the first weird comments a few days ago. It was the name of the author that made me suspicious and I began to check the link in the authors name. Since then I’ve had 5 or 6 of them, some more stubborn than others, but no big flood of them so far.
Thanks for working on this and I hope the studies are going well too 😀
I use WP Hashcash (only) and have only had manually-typed spam getting through still. I haven’t seen any evidence of a bot breaking through this yet, though I know it’s inevitable. (WPH blocks about 200-300 spam a day on my sites, including a couple of PR7s, so they’re prime targets.)
At any rate, maybe WPH has some ideas you could implement in SK.
Regardless, I’m sending you a donation: SK2 may not be my current plug-in of choice, but I’m willing to fund any anti-comment-spam effort… Best of luck.
Dave: I’m with you. I do my blog purely for love too.
I sent in some dough but didn’t read far enough down in your post to see you asked for blog url & e mails along w. the donation. By all means, make me public & use the info I’ve entered for this comment.
BTW, when I try to lv. this comment using FF 1.5 I can’t see the Submit Comment button. It appears to be covered by the “This entry was posted…” msg. So I had to use IE to post this.
Just donated. Thanks for SK!
I’d like to help with docs, support and general crime fighting aspects of SK3.
Thanks for SK. Just made donation. Cheers!
I just sent a donation as well.
I know how you feel buddy as sadly I’m in the same boat.
Luckily this week I could afford a small donation.
I really do appreciate all your hard work.
Take care and sleep tight. 😉
Just recently made a $ contribution and am glad I did. You’re better at, and about, clueing in this noncoding user than just about any coder I’ve come across. Bless you. May whatever gene is responsible get loose and spread throughout the coding-human species.
Seriously, I may be able to kick in a little more after awhile. In the meantime, know tht your efforts and communications are really appreciated. I’ve got a few of the damn things slipping through, but nothing like it was before plugging SK2 in.
Hi, I made my small donation but I’d like also to suggest how to contribute not only to you but to all the anti-spammers community around. You wrote that the enemy has access to our weapons.
Anyway, I think that there are software companies around developing spamming tools and selling them to spammers.
It would be nice to have a list of all spam developers companies and check how their software behaves.
Even better it would be nice to fight these companies in different manners, you know what could happen to these companies if there would be a nice list online 😉
(Do you know what I could do to the texas holdem poker guy if I get my hands on him? 😉 )
I’ll check around but if you, or any reader of this blog has this information, please release it in order to better know the enemy weapons.
Dr Dave, do you mind creating a page or an article (updated) with such a list?
Best regards and keep developing.
Thanks Dr.Dave! I can spare a few bucks for the time and trouble you have saved me
Woa… that’s a lot of comments.
Sorry for being remiss the whole past week, I was busy
bathing in gold and flipping through $100 billstaking care of student life.
So, first of all, a general thank you to all of you. I just posted a few details, for those interested.
Secondly, to all those who contacted me about helping with docs (and left an address): I’ll be sending you an email soon to discuss the improvements that need to be done. Don’t hesitate to bug me (preferably through email) if you don’t receive anything by the end of the week: it means I probably lost your email somewhere…
Now, as for the specifics:
Indeed, Bombay Sapphire is fueling my brain, it’s also helping to protect me from malaria by guaranteeing my daily quinine intake. It also shuts the voices in my head long enough to let me focus on code.
Charles (and a few others)
Actually, I hadn’t realized that people would be scrupulously following whatever amounts I offered by default: I figured anybody could just then go and use Paypal’s free form directly, or use combination of amounts as many times. Hence the limited number of options I had there (corrected ever since).
As for anybody donating $666, or even $40 for that matter: I think I’d be embarrassed and, to say the truth, slightly suspicious of what it is exactly I am relinquishing for such an amount.
And about the exams: no big worries, even if these were busy times indeed, my being more of a “going back to uni” situation, means I am both fairly relaxed and not overly concerned about the rest of my professional life being impacted by this. But thanks for reminding me 🙂
Yea, JS evaluation is fairly basic to do. I can think of many ways (ranging from Greasemonkey to an MFC app using MSIE components) for a spambot to behave exactly like a browser. And expect to see even more of the anti-bot filters becoming increasingly irrelevant.
Regarding WP-Gatekeeper (and similar solutions). I must admit I am not a big fan.
In a nutshell:
– I don’t particularly like the fact it’s tied to language comprehension (sure you can localize it, but…). I would also be worried that quickly increasing difficulty of these “easy” riddles would become an obstacle to some of the less fluent commenters. And if you think any commenter fluent enough to read your blog would be fluent enough to answer that type of question, ask any English-speaking Japanese what color a green apple is: you’ll see what I mean.
Of course, one could ensure this level always remain low enough (by providing a canned dataset and little ways to change it) but then:
– Any riddle-building algorithm based on a limited set of data can be reverse-engineered all the same (say you have thousands of installs with a dozen fruit names and the color associated to them: how long do you think it’d take a bot to work through that).
– More evolved algos, using bigger datasets (and then bringing us back to problem 1) would still be breakable with very basic AI. In fact, I’d be personally much more confident in my ability to break such “Turing riddles” than even the simplest Captchas out there.
So at the end of the day, I can only retain accessibility (minus usability) as an asset over other solutions and Captcha in particular. Knowing that the accessibility card for Captcha is something of a false problem (there are hundreds of ways to work around the issues it creates for sight-impaired users), I would say that the result is not worth the effort in the short term. At any rate, I don’t mean to belittle your work here: bringing in a usability-aware alternative to Captcha is a tough probelm, and one worth studying even if I’m doubtful of the chances…
Not sure what kind of beverages can safely be shipped to Japan (their customs are notoriously tight), but actually I am no longer residing in beautiful Tokyo at the moment (something to do with recent life changes).
“Unfortunately”, there are no such things (for comment spam, at least) as companies providing the spammers with service or tools (the one who claim doing so are usually complete rip-offs and the least of our worries). All these tools are developed and sold in shady virtual back-alley and little can be known of what is done there. Of course, boycotting products and sites that use their service is a minimum, but I doubt they really care.
Actually, one thing I forgot to talk about, but that’s been on my roadmap for quite a while now, would be an automated counter-google-bomb that would ensure all the keywords the spammers try to push automatically get “hijacked” back to non-spammer’s site. For example, if all installs of SK2 had a dedicated page linking “viagra” and “texas holdem” to their respective Wikipedia pages. Currently, the only thing stopping me from doing this, is that I would need at least a dozen neutral (non-commercial and consensual) reference sites in order to efficiently bump spam results off the first page of Google.
Anybody’s got suggestions beside the obvious (Wikipedia and Everything2)?
From one student to another – Spam Karma is one of the best things about Word Press. In my opinion, without Spam Karm, I wouldn’t even use Word Press. So thanks a millin for making it, keeping it up to date – and keeping it so incredibly painless!
I don’t have much money over, but I’m certainly willing to help write FAQs. Let me know!
Dr. Dave- you’re the man- seriously. Spam Karma is the best kinda Karma there is- and my website would have been a comment-free site a year ago, had it not been for you. My donation will come on Friday, once my paycheck (student stipend- I empathize) clears. The $666.00 is intriguing…I wish I had enough to send you the Number of the Beast. Keep up the great dev work.
I made a donation. You created a superb plugin that works, and I look forward to the future of Spam Karma.
Looks like the update to 2.2 might have temporarily taken care of the problem. I noted the difference between before and after updating with a particularly bothersome url getting through. After the update, it got thrown into Hell. I know this may not hold for long, but for now, that irritating bother has been stopped.
Hi Dr. Dave,
Regarding your comments about spambots mimicking users, I think it’s a fair assumption that most spambots are actually remote controlled browser clients, I have made this kind app for a web-test tool and it really made things simpler for simulation purposes because of JS, CSS+IMG loading, headers etc. are all the real deal.
Yea, I reckon SK2.2 should temporarily give us back the advantage over spambots. Let’s see how long that will last (I’d say at least a month or two), and at any rate, I’m not planning to sleep on it.
BTW, be careful if you downloaded SK2.2 right after the announcement: it contained a very stupid and nasty bug in its filters. Make sure you are running SK2.2 final rev 2 (can be seen on SK2’s main page).
Indeed, I mentioned that in the comments above, but basically I am not at all surprised and can think of many ways spambots could be driving browsers… Unfortunately that means all filtering systems relying on “imperfections” of spambots (the way they would miss small details in headers and such) are gonna become irrelevant eventually. Since it was never something I relied on that much, this is not the end of it for now… But we’ll need to work on other areas to catch spambots…
Comments are closed.