One of the pieces of information that I have available to me for every message is what country the IP address is registered to. I have been trying to figure out exactly how it incorporate this into a filter while still giving the benefit of the doubt to messages that are going to be good for that country.

The idea that I just implemented was to add the name of the country into the informational live in the header that I tag on each message. This is part of the SMTP standard, although adding the name of the country is something I have never heard of any one else doing before.

There are two things that this does, first this allows the person reading any header to easily see what country the message is from based on my IP look up, but more importantly this will give more words of the bayesian filters to work with.

The basic idea came to me while I was looking at some message that were in an inbox that is 100% spam. I was watching the score of each individual word as it was running through the bayesian filter. One thing that stood out was that since every message was spam certain words that were in the headers were considered spam markers. These words would have been normally considered neutral, but since every instance was in a spam message they were considered spam.

This gave met he idea that is in the header the country of origin was explicitly stated, then the countries that only send spam to you will give an additional marker for spam. Countries that send both good and bad mail will end up with an extra neutral marker, which tends to not adjust the combined statics very much. And I’d be hard pressed to think of a country that only sends good mail, so well leave that case alone.

This, of course, will likely not be the only deciding factor if a message is spam or not, but every piece of information is helpful.