I installed a mail server on this box so that everyone could get their mail. The software is called MailEnable. It has many nice features, but does not include anything that limits the amount of spam that comes into an account.
Looking through the unsupported downloads on their site, I noticed that people had written these things called "MTA pickups". Basically, you can tell MailEnable to run every message it receives through one of these pickups for processing. However the message is left when the pickup is done is how it is delivered.
Well, I looked at the couple that were available. One of them is a .net program that looks for key phrases. Another simply removes an attachment if it has a nasty extension (exe, scr, etc.), but it's written in VB. There was also an option of installing a plug-in pickup for SpamAssassin (which is what I was using under Linux), but that is a layer on top of SpamAssassin which is a layer on top of Perl. Well all of that is fine if you like installing runtimes and dealing with loading overhead, and it doesn't really address my need for a Bayesian filter.
A Bayesian filter basically checks each word in a message and calculates the probability that it appears in a spam message. It then takes the most convincing 15 words (either very spam-evident or very not-spam-evident), and compares the quantities of each. If there are more flagrant spam terms in the top 15 than non-spam terms, then the whole message is spam.
Seeing that there was a lack of Bayesian filters for MailEnable written in any language that suited me, I set about writing one.
Even in the limited time that I've been coding (like a fiend) I already get a pretty good return on a very small database size (about 40kb of total size). You can test spam against the filters that are running live on this server by visiting the spam test page.
There are several differences between what I've done and some of the other filters. Of course the filtering method is pretty unique amongst the MTA pickups. But the most significant difference in my mind is the per-user filtering.
You can set up a global filter list that works for all users, but users also have the power to produce their own filter lists. They can do this by forwarding the spam mail to themselves and inserting "blacklist:" at the beginning of the subject line. The entire contents of the spam are added to that user's personal spam data set.
The filters I'm using do reasonably well detecting spam that was caught by SpamCop or SpamAssassin, which is good because I'm thinking about dropping it at the end of the month. Even the messages that get through can be trained into the system, which you can't do with SpamCop and is difficult with SpamAssassin.
I only have a couple more features to include in what I have done so far. I need to add a way for users to whitelist senders. This is currently monitored through the configuration system, which isn't complete yet. Basically, I want to forward an email to the server to tell it that an address should always be passed through.
Perhaps I will also add a pass-through agent, so that other filters can be activated after mine is done processing.
When I'm done, I'll release my project to the public and all MailEnable users will benefit! Yay!