SpamCatcher
I installed a mail server on this box so that everyone could get their mail. The software is called MailEnable. It has many nice features, but does not include anything that limits the amount of spam that comes into an account.
Looking through the unsupported downloads on their site, I noticed that people had written these things called "MTA pickups". Basically, you can tell MailEnable to run every message it receives through one of these pickups for processing. However the message is left when the pickup is done is how it is delivered.
Well, I looked at the couple that were available. One of them is a .net program that looks for key phrases. Another simply removes an attachment if it has a nasty extension (exe, scr, etc.), but it's written in VB. There was also an option of installing a plug-in pickup for SpamAssassin (which is what I was using under Linux), but that is a layer on top of SpamAssassin which is a layer on top of Perl. Well all of that is fine if you like installing runtimes and dealing with loading overhead, and it doesn't really address my need for a Bayesian filter.
A Bayesian filter basically checks each word in a message and calculates the probability that it appears in a spam message. It then takes the most convincing 15 words (either very spam-evident or very not-spam-evident), and compares the quantities of each. If there are more flagrant spam terms in the top 15 than non-spam terms, then the whole message is spam.
Seeing that there was a lack of Bayesian filters for MailEnable written in any language that suited me, I set about writing one.
Even in the limited time that I've been coding (like a fiend) I already get a pretty good return on a very small database size (about 40kb of total size). You can test spam against the filters that are running live on this server by visiting the spam test page.
There are several differences between what I've done and some of the other filters. Of course the filtering method is pretty unique amongst the MTA pickups. But the most significant difference in my mind is the per-user filtering.
You can set up a global filter list that works for all users, but users also have the power to produce their own filter lists. They can do this by forwarding the spam mail to themselves and inserting "blacklist:" at the beginning of the subject line. The entire contents of the spam are added to that user's personal spam data set.
The filters I'm using do reasonably well detecting spam that was caught by SpamCop or SpamAssassin, which is good because I'm thinking about dropping it at the end of the month. Even the messages that get through can be trained into the system, which you can't do with SpamCop and is difficult with SpamAssassin.
I only have a couple more features to include in what I have done so far. I need to add a way for users to whitelist senders. This is currently monitored through the configuration system, which isn't complete yet. Basically, I want to forward an email to the server to tell it that an address should always be passed through.
Perhaps I will also add a pass-through agent, so that other filters can be activated after mine is done processing.
When I'm done, I'll release my project to the public and all MailEnable users will benefit! Yay!
Comments
Comment by Lemmy on .
I just dont get this spam stuff............
It was just a huge annoyance at first but now due the sheer volume of it it has become very problematic for a number of reasons. I detest government intervention in anything and them trying to make stop spamming or make it illegal is laffable but I do appreciate the thought.....just think....a spam free internet !! Very nice. Do you or anybody you know ever read this stuff ? I have never opened a spam-mail and dont know anyone who ever has. How is it that this stuff is such a huge business ? I am gussing people make money from it or it wouldnt be everywhere. But who really opens this stuff ?? AOL'ers ??? I hate hackers and people who get kicks from spreading virus's but if I possessed the knowledge to do this stuff I would declare my own war on known spamming houses.
Interesting thing for me is I just changed servers from a 'real' server to a national discount server (6.49 a month!!) and the cheepo server has all but shut spam out of its networks whereas the 'real' server couldnt get a handle on it and had us going to a site called Postini.com to retrieve mail....yeah, thats what I want to do.....more crap just to get an email !!
Oh well, I guess it is a hardship one must suffer to enjoy pages like this one and the ever informative, entertaining and enjoyable psuedomain.net.
Comment by Owen on .
I forget where I read recently that a company that used spam tactics to sell its herbal Viagra product had accidentally exposed its client list on the internet. The company had done millions of dollars in business. That's millions on an herbal remedy for impotence, all marketed by sending people spam. It's both amazing and outrageous.
Comment by Lemmy on .
Depressing..........
It is just proving my own personal thoery that people are geneticly losing thier ability to think in a rational manner.
Just think if the same amount of concern for ones potency went into something important...............
Comment by KJ on .
Comment by Owen on .
Comment by Scott Mullen on .