Lots to say about comment spam today…
Have you noticed the new tactic the spammers are using to get spam onto WordPress sites? They’re sending comment spam without any URLs. You think that’s a good thing and that the spammers are just crazy? I’ll explain why you’re being lulled into a false sense of security.
There is a toggle in WordPress (this may be a 1.5-only thing - I haven’t checked in 1.2) that sends all comments to moderation unless that user has commented before. There are actually a couple of spam plugins that check this, too, automatically whitelisting anyone who has already left an approved comment. This is a good thing, but you have to be careful not to get duped into approving a spammer’s unobtrusive comment.
If you did approve one of the many comments that sound like, “Just found your site, and it’s exactly what I wanted,” then you’re likely in for a surprise. The spammer script will notice that the old ocmment was posted, and it will post another comment using the same email address that actually contains URLs. Sneaky, eh?
On the flip side, spam prevention in WordPress 1.5 seems to be improving. Matt recently announced that he’s using only the built-in protections to prevent spam and it seems to be working fine. Well, that’s pretty swell, but I wonder if anyone has tried to write anything about “Texas-holdem poker” in his comments.
Looking through the WP 1.5 comment code, it seems that the entire comment is checked for the spam search strings. So unless the search strings are uber-regexes, they’re going to detect regular text. This means they have the potential for high false-positives.
Fortunately, most people don’t have the need to talk about Texas hold’em in their comments, so this issue isn’t frequent, but assuming that spammers get more crafty in wording their comments with frequently used acceptable words then this could pose a problem. The immediate alternative to adding would be to add every spamer IP to the spam wordlist. I’m going to guess that this might cause a bit of inefficiency when posting comments.
Keeping this list up to date can also be a hassle, since - at the last CVS revision I saw - there was no way to automate this process.
My thought is that the integrated stuff is good enough at preventing casual spam, but some additional features are necessary to bring it up to the user-friendly standard at which WordPress usually runs.
Only the most recent CVS has changed the Dashboard to not show spam comments. That’s good. I had revised my Dashboard to show a count of spam comments like it does the moderated ones. The link goes to the moderation.php page, but with a special querystring that forces the display of spam comments. This allows me to delete them. I’m not sure if the latest CVS has added a way to review spam posts yet, but being able to approve false positives is the main benefit of keeping spam comments, right? It should do this.
I’ve also been toying a little with captcha code, and I’ve encountered the trackback issue. Trackbacks can’t use captchas. If’ you’re protecting your blog exclusively with captchas, and you’re not blocking trackbacks, you’re wide-open and ripe.
I’ve been devising a scheme to allow a site to authenticate itself when it creates a trackback. This would use the xmlrpc plugin hook in 1.5. Of course, it would only work on WordPress blogs, but it’s a start. The interface would be pretty simple, and although it wouldn’t be a solution for preventing initial trackback spam, it would keep people from offending repeatedly. This is an idea still in incubation.
Interestingly, the new OSA code is coming along great. I’ve added a bunch of new features. The best one of the bunch so far is the peer-distributed spam wordlist. You can point your blog at any other OSA-running blog to update your wordlist from theirs. Every entry is accountable to the person who originally added it and can be authenticated. It’s pretty slick and it’s currently working. Since the spam list builds itself from OSA, any person running it is liable to have a good wordlist. There will soon be complete administration of the blacklist entries, so you can easily weed out the entries from sites you don’t trust.
OSA should augment the built-in spam prevention methods nicely.