One of the greatest features of modern spam fitlers is the ability to look at the content of messages and identify if it is spam or not based on previous messages that have been categorized as spam.

The largest down fall to this method is that you need a ‘’statistically relevant” number of messages that are classified as either good or bad messages before these filters work well. The problem with this is that you need input from the user to do this or you need to have an over generalized set to start with.

While reading about writing these filters I started to think about how to automate this while still taking the user in the equation.

The idea that I’m toying with right now is to use the users white list to determine good messages and then a combination of RBLs and the users black list to determine bad messages. once I have a statistically relevant number of messages then the filter itself will start to work in conjunction with everything else.

This will train directly on the verbage that is used in the users own messages while only requiring them to create a white list. In the case of false positives and false negatives the users will be able to look through messages for a certain time period and reclassify them as good or bad. This will in effect retrain the filters and prevent the false positive or false negative from recurring.

The process of building each users filters based on their own email messages ‘’should” make their spam filters more effective in the long run, with very little effort in the short run.