I figured out the problem where adding more nodes slowed down the system and a work around that keep the performance stable enough to add more nodes back in on a regular basis.

Turns out that the system as a whole could not handle when lots of nodes attempted to work on the same message at the same time. I had built in a throttling system which allowed only one node to work on one section of each message at one time, but the nodes were being so aggressive at trying to process messages that they were stepping on each other toes.

I added in another limiting factor that when more nodes are present each node takes slight longer before it tries to pull more data in and it has been working very well today.

So the problem wasn’t Mnesia or MySQL, but in fact my over use of them :-)

Right now the biggest bottleneck in the whole system is MySQL, and that is saying a lot as most messages are processing in under 5 seconds. The problem occurs when a burst of 30 to 50 messages come in and they all try to process at the same time.

The solution to this is to create a master database server and have several replicated slave servers. The larger problem is that my current funding will not allow for this right now. As soon as I can afford to get two really beefy servers I’ll start breaking then out into a master and slave configuration.

I’m almost tempted to turn one of my servers that is running Erlang into a MySQL slave, but I don’t think any of them are powerful enough to create a performance increase.

Another option I have thought of is to add a MySQL slave to each server that runs 6 nodes. Then the data would be local and could never have more then 6 connections to the MySQL server.

While this would improve performance of MySQL I worry about he performance of Erlang with the current configuration, so either way I need more and better hardware.