I’ve been looking at spam messages for years. I’ve been reading articles and essays about spam for years. I’ve been getting spam for years and the only thing that I know for sure is that no one has found the perfect way to prevent spam from getting into my in-box.
My latest thoughts of the problem is that we are looking too hard at the overall problem and at the individual spam message. We are seeing the two ends of the spectrum but we do not have the information to fill in the gaps in-between.
This is part of the reason that I am taking the current approach that I am on dealing with spam. We need more statistical data about why spam messages are spam.
I don’t just want to know that a message triggered an RBL, I want to know every filter it triggers. I don’t want to know that a word somewhere in the entire message was on a block list, I want to know what the word was and if it was in the subject line, one of the headers or in the body and I want to know how many times it was used.
And if a message is on a white list I still want to know what would have happened if it had been processed….
Without this level of information, I do not think we will be able to conceive of the next generation of filters. Once there is a pattern in the chaos, it is simple to filter it out. But first we need a complete picture of the chaos and the tools to find the pattern.
[tag]spam, spam filters[/tag]