Spam Free Email

Anti-spam ideas, tools and services

December 31st, 2005

Collaborative filters vs personalization

While designing my anti-spam service I have been thinking about how I want to handle things like really rare words and personalization. These two concepts are on opposite ends of the spectrum for spam filters.

One of the best ways to deal with really rare words is collaborative filters. This allows you to take the settings of other people to evaluate a word that you have never seen before or have seen very few times. The problem with this is that other people may have a different idea of how to classify those words then you do.

Enter personalization. This is where Bayesian filters truly succeed in filtering spam, by looking at how you have previously classified spam messages and doing some wild statistical analysis on them. The weak point is when they do not have enough information to produce a statically significant result.

I think I’ve found a way of combining these two theories into a working filter that allows for the best of both worlds.

December 30th, 2005

Weighted Distributed Processing and caching

I’m working on the BETA code for the anti-spam service these days, and I was trying to get things moving a bit faster and smoother.

I have had my servers doing distributed processing on a random basis, choosing a random node to send a process to for completion. I just setup a way that each node knows a weight for all the other nodes. Now, even though it is still random, the server with a higher weight get more of the load.

I also started using ETS to keep certain data in memory. While I had been doing this for a while, I just added a few scripts to load the entire data set into memory as the nodes start. It take a second to load, but the speed in processing is much greater.

I keep finding things in Erlang that are so well thought out and designed that I know I couldn’t do it any better. Plus they are already there. It’s really a matter of going through all of the (lousy) documentation and figuring out what is there so that you know when you might want to use it.

Of course I tend to find it after I’ve already solved it on my own and end up rewriting code, but everyone has their own method.

December 29th, 2005

Let the BETA begin :-)

Earlier today I finished up the user interface enough to put my own email on the new anti-spam system. The funny thing is that in the past hour I’ve seen more spam on my own email address then on the two test domains combined.

I still need to refine the user interface, but that is not going to be too difficult. My aim is simplicity with lots of features, so I’m trying to make it easy to use with every feature I can think of :-)

I might do an open BETA later this month for anyone who might be interested.

December 27th, 2005

Reclassifying Spam and User Interfaces

I’ve been working on the front end for the SpamFreeEmail.com service that I have been writing and I think I have come up with an interesting idea for a front end interface.

Instead of making people log into a web based interface, why not give them a minimalIMAP interface. That way they could constantly monitor everything concerning their email from the same mail client.

Three Main folders would be needed; Good, Bad and Neutral, possibly a fourth for viruses. Moving mail from one folder to another would automatically reclassify it to the new designation. Marking any message with a Flag would forward it onto the destination mail server.

This idea is to simplify the users life and let them regain control over their communications. Giving them an easier interface to access their anti-spam system certainly qualifies for this.

Plus I’m already using the Maildir format with the basic flags setup for IMAP, the only thing I would need to do is write the IMAP interface and link it directly to the mail server, and at the speed I’ve been developing code in Erlang, that would be easy.

I’m going to slate this on my to-do list for sometime after my go live launch, but that should still be in the first half of this year.

December 27th, 2005

Rebuilding servers

I rebuilt all five servers today, making sure that they install scripts included ClamAV and a host of other things that I have reorganized in the past few weeks.

The outcome was quite pleasant. Took me about 2 hours to rebuilt the 5 servers which run 8 Erlang nodes. This process included completely reinstalling Fedora core 4 from scratch, along with Erlang, ClamAV and all the tools needed to support this configuration.

I did rebuild two of the servers twice, as I switched to assigning IPS via DHCP instead of by hand each install.

Needless to say when one of the servers goes on the glitch, who cares about troubleshooting if you can rebuilt the whole thing in 15 minutes. This also means that upgrades will be rebuilds and newer versions of anything will not be very difficult to accomplish.

December 26th, 2005

Got ClamAV working

well, it took me almost all day, but I finally got the C-port for ClamAV working the way I wanted it to.

Turned out there is a bug somewhere that makes it not like some of the file names I had. I use the Maildir naming convention and in the middle of the file it had R23P45 or something like that. it was an R two numbers and a P then two numbers and when ever this pattern came up it would segment fault with no useful information.

After collecting enough examples of the exact messages that were causing the problem and noticing that they worked fine when I renamed then I rewrote my maildir naming function and I generate a temporary name before I can the message.

It’s working quite well and is already integrated into the email system.

December 26th, 2005

ClamAV for Erlang

A few days ago I posted that I was planning on writing a C-port driver for ClamAV so that I could use it in Elrang. Then this morning as I was checking my comments on different sties I noticed that someone named Oliver gave me a link to his C-port that he had already written.

You can find it at http://labs.biniou.info/erlang/erlClamAV/

I downloaded it and made a few changes to make it work better on my system and it seems to work just fine. I then emailed Oliver with the changes that I made so that he can look into incorporating them into his next release :-)

Now I need to integrate ClamAV into my email server, which should be way more fun then writing the front end :-)

This is why I have been blogging about what I am doing. I know that people end up finding it and this has been the single greatest help in pushing my project along from an outside source :-)

December 24th, 2005

State of Spam Free Email

Been thinking about where I am in the process of creating this service and I thought I’d put it down for posterity :-)

I now have 8 Erlang nodes working together to create the core of the system, in reality 6 of them are the core and the other two are additional processing power. I had hoped to get 6 nodes before the first of the year and I had all 8 of them up and running on the 19th of December. All of the nodes are using Erlang’s distributed processing to communicate with each other.

Three of the nodes are exposed to the Internet and accepting SMTP email all of the other nodes are helpers that process the mail and tag them for spam traits.

Last week I got the Bayesian filters working and they are working amazingly well. I was catching about 80% of the email messages as spam, now I’m up to about 93%. I’m still using two of my older domains where the real spam percentage should be 99%. The filters are still learning, and I expect them to get more accurate as the days and weeks go on.

The SMTP client portion of the software is also complete, at least it is ready for BETA testing. I still need to implement retries, but I have them figured out.

Speaking of that I consider the software to be in a pre-BETA state right now. I’m hoping to start by BETA testing at the first of the year when I intend to put my personal email onto the system.

Before I get that far I need a rudimentary user interface, which is what I am working on right now. As soon as I can reclassify email messages that are bad to good and vice versa and I can forward email that has been quarantined I will move to BETA testing.

I still have some rather large components to write to make the SMTP server completely RFC compliant, but everything that has been written so far is compatible with the RFCs.

That is pretty much so it, hopefully my BETA will go off without any hitches and I’ll possibly open the BETA testing up to more domains then just the ones that I own.

December 24th, 2005

SpamFreeEmail.org

I just got the domain name spamfreeemail.org, literally I just got the final email within the past 5 minutes.

While spamfreeemail.com is the main website for the service I am creating and spamfreeemail.net is the domain that the actual service use, I intend to use spamfreeemail.org to offer these same service at greatly reduced rates or for free to deserving non-profit organizations.

Of course I haven’t even finished the product so talk about giving it away to good people doing good works is a bit premature, but it is the season :-)

About a month or so after I release the main product I will setup a page to let non-profit organizations sign up and I’ll evaluate them to see how much of a discount they will get, including some that will be completely free.

December 10th, 2005

Erlang Distributed processing

So last night I reread the documentation on distributed processing for Erlang. The disturbing part is that is started to make sense.

So I thought about it for a while and I decided to give myself the weekend to rework the application to use distributed processing. I just finished it about an hour ago … a whole day early.

In practice it didn’t work exactly how I was thinking it would, but the documentation was right, I was just reading it wrong. Or rather their assumptions about what I knew did not watch up to my assumptions about what they were saying ….

If fact it was much easier then I was trying to make it out to be. I decided not to distribute one of my processes, as it seemed safer not to. I’m not worried about much, but I do worry about loosing mail message entirely :-)

So right now my test system has one server and two helpers. I’ve managed to get a total of eleven Erlang nodes to talk to each other at once with my existing hardware.

So now tomorrow, I’m starting the anti-spam engine :-)