I had an online conversation yesterday with an acquaintence of mine. She was alarmed to have found that her entire site’s content had been republished by some other site!
Apparently, their site had been sucking on her site’s RSS file for quite some time, and managed to download a sizeable chunk of data, which they subsequently republished with their own ads strewn about. And she’s not the only one by a long shot.
If you’re not aware of this phenomenon, it’s generally referred to as “Splogging”, for “spam blogging”. The idea is usually to re-blog content form other people’s blogs to gain emphasis on their popular terms for your splog site.
For example, if I wanted my site to be a popular search result for “student loans”, first I would install a blog on my server. I would then use some software to aggregate, say, the Technorati feed for posts tagged with “student loans”, which gives me a rich bed of content to start populating my site. Using some some dodgy plugins sold by less-than-respectable authors, I can even have WordPress do all of this work for me.
Then, I sprinkle a few links onto the splog that point to my money-making page, and voila! Instant PageRank!
The bottom line for bloggers is that your popular content will be stolen and used to fuel a link farm that profits someone else. How nice. So what do you do to combat it? I have a suggestion or two.
If you’re inclined to modify your server’s configuration a little bit, there are actually a few things that you can do that are much more efficient than what I’m about to suggest. Check out Val’s rant for a list of those things. It usually involves modifying .htaccess, which might be available to you, and is often tedious to keep on top of unless you’re really vigilant (read: staring at your logs all the time).
An easy alternative to messing with your config files is using this new plugin I’ve written, called AntiLeech.
What does AntiLeech do? AntiLeech does not prevent the splogger bots from accessing your site. No, it does better than that. It produces a fake set of content especially for them that includes links back to your site (and mine, too, ok?) and sends it only to them. When they steal this content, it appears online just like normal, except now you’ve turned the tables on them. You’re actually using the sploggers to promote your own site.
AntiLeech can detect a splogger bot using its User-Agent string (an identifier that some bots send when they are collecting data), or by IP address. You can enter a User-Agent or an IP address into the Options panel of your WordPress blog. When a visitor with a qualifying (any checked option on the options page) User-Agent or IP address visites your site, they will see only the generated content. They will see it in your page layout and in your feeds. Anywhere you’re normally outputting content, that’s where the fake content will appear to them.
Regular users whose browsers do not match these strings will see your normal content. RSS aggregators should be able to display your content normally, too.
AntiLeech also uses a trick to detect when new User-Agents have collected and displayed your posts. You may see a little “AntiLeech” graphic in your feed output. This graphic helps AntiLeech collect User-Agents that you might want to block. AntiLeech will tell you on what page it first saw the User-Agent, if it can, to help you better make the decision to block that User-Agent or not.
You can turn off this option if you don’t want the image to appear in your feeds, but then AntiLeech won’t be able to detect new User-Agents for you. The image is pretty small and unobtrusive, and doesn’t link to anywhere.
In addition to all of that, AntiLeech will produce a robots.txt output from the User-Agents that you’ve specified in the options page, assuming you don’t already have one. In WordPress 2.1 there is a hook for this already, but this feature of AntiLeech miraculously still works in WordPress 2.0, too!
Of course, I haven’t had AntiLeech in production very long, so I would like some feedback on your use of it, especially if you find it useful.
Let’s get these sploggers!
Ok, now I must get food. Sorry if this reads a little light-headed.