I installed a mail server on this box so that everyone could get their mail.  The software is called MailEnable.  It has many nice features, but does not include anything that limits the amount of spam that comes into an account.

Looking through the unsupported downloads on their site, I noticed that people had written these things called "MTA pickups".  Basically, you can tell MailEnable to run every message it receives through one of these pickups for processing.  However the message is left when the pickup is done is how it is delivered.

Well, I looked at the couple that were available.  One of them is a .net program that looks for key phrases.  Another simply removes an attachment if it has a nasty extension (exe, scr, etc.), but it's written in VB.  There was also an option of installing a plug-in pickup for SpamAssassin (which is what I was using under Linux), but that is a layer on top of SpamAssassin which is a layer on top of Perl.  Well all of that is fine if you like installing runtimes and dealing with loading overhead, and it doesn't really address my need for a Bayesian filter.

A Bayesian filter basically checks each word in a message and calculates the probability that it appears in a spam message.  It then takes the most convincing 15 words (either very spam-evident or very not-spam-evident), and compares the quantities of each.  If there are more flagrant spam terms in the top 15 than non-spam terms, then the whole message is spam.

Seeing that there was a lack of Bayesian filters for MailEnable written in any language that suited me, I set about writing one.

Even in the limited time that I've been coding (like a fiend) I already get a pretty good return on a very small database size (about 40kb of total size).  You can test spam against the filters that are running live on this server by visiting the spam test page

There are several differences between what I've done and some of the other filters.  Of course the filtering method is pretty unique amongst the MTA pickups.  But the most significant difference in my mind is the per-user filtering.

You can set up a global filter list that works for all users, but users also have the power to produce their own filter lists.  They can do this by forwarding the spam mail to themselves and inserting "blacklist:" at the beginning of the subject line.  The entire contents of the spam are added to that user's personal spam data set.

The filters I'm using do reasonably well detecting spam that was caught by SpamCop or SpamAssassin, which is good because I'm thinking about dropping it at the end of the month.  Even the messages that get through can be trained into the system, which you can't do with SpamCop and is difficult with SpamAssassin.

I only have a couple more features to include in what I have done so far.  I need to add a way for users to whitelist senders.  This is currently monitored through the configuration system, which isn't complete yet.  Basically, I want to forward an email to the server to tell it that an address should always be passed through.

Perhaps I will also add a pass-through agent, so that other filters can be activated after mine is done processing.

When I'm done, I'll release my project to the public and all MailEnable users will benefit!  Yay!

Comments

Comment by Lemmy on .
Lemmy

I just dont get this spam stuff............

  It was just a huge annoyance at first but now due the sheer volume of it it has become very problematic for a number of reasons.  I detest government intervention in anything and them trying to make stop spamming or make it illegal is laffable but I do appreciate the thought.....just think....a spam free internet !! Very nice.  Do you or anybody you know ever read this stuff ?  I have never opened a spam-mail and dont know anyone who ever has.  How is it that this stuff is such a huge business ?  I am gussing people make money from it or it wouldnt be everywhere. But who really opens this stuff ??  AOL'ers ???    I hate hackers and people who get kicks from spreading virus's but if I possessed the knowledge to do this stuff I would declare my own war on known spamming houses.

Interesting thing for me is I just changed servers from a 'real' server to a national discount server (6.49 a month!!) and the cheepo server has all but shut spam out of its networks whereas the 'real' server couldnt get a handle on it and had us going to a site called Postini.com to retrieve mail....yeah, thats what I want to do.....more crap just to get an email !!

Oh well, I guess it is a hardship one must suffer to enjoy pages like this one and the ever informative, entertaining and enjoyable psuedomain.net.

Comment by Owen on .
Owen

I forget where I read recently that a company that used spam tactics to sell its herbal Viagra product had accidentally exposed its client list on the internet.  The company had done millions of dollars in business.  That's millions on an herbal remedy for impotence, all marketed by sending people spam.  It's both amazing and outrageous.

Comment by Owen on .
Owen
I did do a lot with it and use it myself on my own email, but the codebase hasn't been maintained. The main issue when I was last working on it was speed. It would take several seconds to scan email, and that wasn't acceptable to me. I had figured out a way to speed up the system to less than a second per mail, but it involved changing the format in which the word lists were stored. I had accomplished that, but had not updated the code that adds new words to the wordlist to use the new format. So what I currently have is a system that will run from a dedicated speedy database, but I can't add new words to the filters because that code isn't written. In spite of this obvious shortfall, SpamCatcher has been doing reasonably well with the original word list, but it currently returns a lot of mid-range false positives. That is, it never deletes any good mail thinking it's spam, but a lot of it arrives in my mailbox as suspect. So, yeah, I may return to development on this (among all of the other projects that I have no time for), but I'm not going to release it and have people expecting me to provide updates. Sorry.
Comment by Scott Mullen on .
Scott Mullen
Hey If Im at the right web site, some of you guys used to be friends with Allen Klinger. Im a friend of his from PA and was at his funeral. I may have met you. In any event I was looking around to see if there are any pictures of Allen out there. Its hard to believe but in the time i knew him I dont have one. write me or call if you have any you dont mind passing them along. thanks Scott. mullen_scott@hotmail.com or 610-520-1693