In the statistics that I look at while I am developing this system I have been amazed that the percentage of spam message has risen to 98.65% and viruses take up another 0.34%. Which means that 99% of all message currently going through this system are bad.
While this is not a surprise to me, it surely does resolve the idea in the back of my mind that spam is totally out of control.
On the plus side, I’m seeing fewer and fewer false positives in the system. (Good emails getting marked as spam) Which is a function of training the filters and I think I have only had one false negative, or a spam message that got through the system marked as good, in the past week. That message was a work of art as well, but eventually even those will be caught by the system.
The optimizations that I did a week or so ago seem to be holding well, but the more messages in the system the slower the system is working. SO I do know I need a few more rounds of optimizations or I might need to give up the real time aspect of the filters for a periodically updated version.
Basically now as each message is put into the system it automatically is incorporated into the other filters. I may need to update the filters once an hour, once a day or once a week with new message. Not sure what will end up being best.