I’ve been distracting myself from the things I really should be doing today by thinking of other ways to optimize the profiler, which is the piece that processes each mail message and determines if it is spam or not.

I found a few ways to avoid redundant disk reads and changed some SQL statements to reduce the number of ODBC connections. I’m still not overly happy with the performance of the system as a whole, but I think it is working great for a BETA stage product.

My numbers show that is takes 65 seconds to process the average mail message and I think I have maxed out at about 60 seconds per message on the current system with it’s current design.

Of course I am not fully implementing distributed processing and concurrency with the profiler. One of my next step is to break the profiler into distinct sections that will process on randomly chosen nodes concurrently. Then once the entire profile is created the message will be evaluated and processed as needed.

This, of course, require a bit different logic, but the code stay remarkably similar.

I know I will need to create a more advanced evaluate command, but other then that I think the only other thing I will need will be to distribute the individual processes to different computers.

I currently have 5 server with a total of 25 nodes, 22 of those nodes are dedicated to profiling. The profiler currently has about 6 distinct sections, so having 6 different nodes processing concurrently ”’should”’ mean that the messages will complete faster then if I do all six sections in order.

Plus I am using ClamAV, which I have working on my Linux servers, but not on my Windows computer. Right now I am unable to join my workstation into the cluster to process mail since it cannot do the AntiVirus, but if I write the AntiVirus process to where it will only run on the Linux servers I will be able to add my workstation back into the cluster :-)

It really has taken me longer then I thought it would to see the ease and benefits of distributed processing and concurrent processing in Erlang, but at this point I don’t see any other language I’d want to use to create a service like this one.

and I have a few more projects that I’ve thought up already :-)