I’ve spent about 30% of my time learning how Erlang programs are suppose to be designed using OTP and the only thing that I can take as a 100% fact is the documentation is not good enough.
I did find some great how-to documents at trapexit.org. While they still make some assumptions about how much the reader knows about erlang and the OTP their how-to’s are much easier to dissect and have much more documentation in the code. It’s almost inspired me enough to completely document my own code … almost …
In the end, once you’ve gotten OTP working in your Erlang programs it make things much simpler. You do end up creating a lot of code over and over again, but the error correction and supervision is well worth it in code that needs to have the maximum amount of up time.
While working on the SMTP server, I was noticing that the code was crashing 4 or 5 times a day. While I did not change the code, I did add OTP aspects and supervisors and the code started to run without any noticeable down time. While I have fixed the problems that I think were causing the crashes now, the OTP design handled the restarts effortlessly and let me get past the little things.
Right now I’m in the process of understanding how a full supervision tree with multiple worker processes is put together. Once I have that I’ll have a rock solid SMTP server with nearly 100% up-time.