Spam Free Email

Anti-spam ideas, tools and services

January 16th, 2006

98% spam …

In the statistics that I look at while I am developing this system I have been amazed that the percentage of spam message has risen to 98.65% and viruses take up another 0.34%. Which means that 99% of all message currently going through this system are bad.

While this is not a surprise to me, it surely does resolve the idea in the back of my mind that spam is totally out of control.

On the plus side, I’m seeing fewer and fewer false positives in the system. (Good emails getting marked as spam) Which is a function of training the filters and I think I have only had one false negative, or a spam message that got through the system marked as good, in the past week. That message was a work of art as well, but eventually even those will be caught by the system.

The optimizations that I did a week or so ago seem to be holding well, but the more messages in the system the slower the system is working. SO I do know I need a few more rounds of optimizations or I might need to give up the real time aspect of the filters for a periodically updated version.

Basically now as each message is put into the system it automatically is incorporated into the other filters. I may need to update the filters once an hour, once a day or once a week with new message. Not sure what will end up being best.

January 16th, 2006

Bayesian filtering on country of origin?

I was working on getting all of the messages categories by what country their IP addresses originate from and the thought occurred to me that using a Bayesian filter based in the country of origin might provide some useful information.

As with all Bayesian filters it would give messages an equal probability of being good or bad based on the country of origin and then take past email messages into account to see if there is a high probability that the message is spam or not.

I think I’m going to setup an experimental/informational filter that will run these statistics and see how well that filter works against the email coming into the system. I suspect that it would be best to only use messages that are specific to each user with this filter and the more email in their personal corpus the better the filter would get.

Of course this is where white lists and users training their own filters make a bigger difference, but they were doing that for the other filters anyway ….

|