The past two weeks have seen an upswing in posts around here, and the consequence is that more spiders stop by, and the consequence is that they find the search links on the right. When the spiders hit the search links in quick succession without loading anything else, it digs a big crater where the server used to be. Well, I’m tired of that.
There are a ton of nasty bots that aren’t obeying the robots.txt, and really I think it can’t be relied upon to prevent the most heinous bots from destroying a site’s productive page serving. I’ve thrown some Apache modules at the issue, but they don’t seem to help. I’ve specifically excluded sections of the site from certain user agents, and that seems to do well, but there’s no avoiding getting trounced by these freaking ill-behaved spiders and comment spamming bots.
So I wrote a new Habari plugin that checks the server load and sends a 503 (Service Unavailable) when the load gets too high. A custom template is employed for the 503, so even though the site doesn’t provide content, it still looks like it’s trying. The settings are configurable, so I can turn certain taxing features of the site off temporarily while there’s moderate load, and send the whole site into standby if the load hits the roof. When the load returns to normal, the walls automatically drop.
I’ve also killed all of my spam comments. The internals of Habari need to be changed to use raw SQL to do the deletion rather than iteration. I get too many spams, and by the time I get around to cleaning out my spam bin (at most recent count, I had more than 60,000 caught spams), I have more than Habari can hold comfortably in memory. As a result, I can’t delete any of them. Subtle flaw.
For Later: delete comments., commentinfo. from comments inner join commentinfo on comments.id = commentinfo.comment_id WHERE comments.status = 2;
This new plugin is probably not the optimal solution for handling load, but it’s what I have time to implement in the few minutes I have to concentrate on it. I think this is better than the site simply not responding, but continuing to process requests during high load, stacking the deck against it ever returning to normal. At least with this plugin, you see that the site exists, and that there is a problem, and that things will eventually return. That seems positive to me.
The next iteration will probably employ caching during load, which likely won’t come until Habari 0.4, where we might finish off the internal caching classes. Hopefully, there’s not too much longer until 0.4 surfaces.