Spam Free Email

Anti-spam ideas, tools and services

December 23rd, 2007

Current Project

I know I haven’t posted for a long time on this site, but I have been busy with other email related activities and I wasn’t sure if this site was point people to where I am currently working on these projects.

I’ve been working on creating an open source email server in Erlang and I’ve been posting my activity over at http://erlsoft.org

Most of the project has been taking a lot of what I learned form SFE and making it available to anyone who wants to use the code. The project is also up on Google Code at http://erlmail.googlecode.com

The SMTP server it pretty stable at this point and I’m spending most of my time on the IMAP modules, mostly on the server. I’ll put together a POP server once I have the IMAP stuff working.

I recently created an anti-spam module that is mostly some hooks into the SMTP server that will allow for functions to be run on each email message before and after is have been save to the message store. The hooks I have in place would allow me to re-create SFE within ErlMail without having to recompile the ErlMail application, the only thing I would need to do is list the appropriate modules and functions in a config file and the next time the server restarts those checks would be active.

This approach would allow me to create a complete open source email server and still have the SFE components private is I choose to.

I am also designing a modular message store into the ErlMail server to allow for separate message store for different purposes. This would allow me to keep one message store fore the SFE servers and a separate message store for the email server that is access by the users on the same cluster of Erlang servers. This is a design specification that came directly from my work on the SFE project.

In the end, the ErlMail project would allow me to create the SFE project using the ErlMail server and most likely less then 1,000 lines of additional code, since all the real work is being done by ErlMail :-)

Cheap Penisole
Cheap Pravachol
Ophthacare
Viagra Jelly
Buy Acomplia
Buy Claritin
Order Protonix
Purchase Norco
Order Bupropion
Cheap Femara
Buy Prograf
Cheap Zerit
Buy Trazodone
Purchase Purinethol
Purchase Nicotinell
Cheap Isordil
Order Tenuate
Cheap Azulfidine
Buy Bactroban
Rhinocort
Combivent
Buy Drug
Buy Retin-A
Buy Mobic
Order Imdur
Buy Detrol
Mevacor
Buy Pravachol
Cheap Bonnisan
Purchase Cephalexin
Adderall
Aldactone
Cheap Zyvox
Purchase Soma
Buy Loxitane
Zyrtec
Order Ephedrine
Men Attracting
Buy Rocaltrol
Cheap Nexium
Buy Proventil
Buy Dilantin
Cheap Trandate
Rumalaya
Buy Altace
Order Darvocet
Cheap Claritin
Ordering Didrex
Cheap Feldene
Cheap Rimonabant
Cheap Hoodia
Phentermine
Cheap Septilin
Imdur
Buy Atrovent
Purchase Ventolin
Order Lortab
Order Hoodia
Purchase Tulasi
Purchase Proscar
Cipro
Buy Rimonabant
Avodart
Acne-n-Pimple Cream
Cheap Hytrin
Sumycin
Requip
Keftab
Order Lanoxin
Buy Nexium
Prinivil
Order Geriforte
Order Trazodone
Buy Speman
Plavix
Nizoral
Purchase Lasix
Himplasia
Order Famvir
Adipex
Cheap Xeloda
Toprol XL
Purchase Bupropion
Buy Cystone
Purchase Diabecon
Leukeran
Snoroff
Purchase Aricept
Cheap Levitra
Women’s Intimacy
Purchase Aciphex
Buy Sustiva
Order Zestril
Order Prevacid
Prednisone
Purchase Plavix
Cheap Phentrimine
Purchase Lipitor
Buy Clarinex
Purchase Atacand
Purchase Adipex
Buy Zanaflex
Order Clarinex
Buy AyurSlim
Calan
Cheap Propecia
Purchase Geodon
Order Atacand
Order Didrex
Purchase Serophene
Cheap Arimidex
Cheap Phentermine
Purchase Zimulti
Purchase Septilin
Purchase Claritin
Purchase Prednisone
Dostinex
Cheap Lamictal
Buy Levlen
Detrol
Purchase Copegus
Purchase Zyban
Cheap Methocarbam
Ultram
Purchase AyurSlim
Herbolax
Cheap Celexa
Order Herbolax
Bontril
Cheap Serophene
Purchase Aristocort
Buy Zebeta
Buy Dosages
Prozac
Order Synthroid
Nimotop
Order Codeine
Cheap Copegus
Levaquin
Order Bontril
Order Zanaflex
Order Evista
Neurontin
Purchase Oxytrol
Amaryl
Cheap Bupropion
Cheap Sumycin
Buy Aristocort
Order Endep
Celexa
Herbal Phentermine
Male Enhancement
Kamagra
Order Plavix
Purchase Acomplia
Cyklokapron
Buy Rumalaya
Cheap Acyclovir
Buy Maxaquin
Purchase Lexapro
Cheap Altace
Order Neurontin
Purchase Menosan
Trimox
Cheap Fioricet
Professional Plasma
Order Hytrin
Cheap Augmentin
Order Lopressor
Order Glucophage
Order Percocet
Order Mexitil
Zero Nicotine
Buy Motrin
Soma
Menopause Gum
Shuddha Guggulu
Buy Geodon
Buy Myambutol
Buy Zocor
Order Diovan
Order Motrin
Cheap Aristocort
Purchase Allegra
Order Clonazepam
Purchase Cardura
Order High
Order Prinivil
Risperdal
Order Stromectol
Buy Cardizem
Nexium
Cheap Myambutol
Sustiva
Urispas
Purchase Parlodel
Fioricet
Order Hydrocodone
Buy Seroquel
Order Adipex
Mental Booster
Purchase Desyrel
Purchase Relafen
Purchase Ansaid
Cheap Bontril
Purchase Himcospaz
Purchase Synthroid
Buy Antabuse
Bactroban
Buy Avandamet
Cheap Norvasc
Order Arimidex
Elimite
Order Arava
Purchase Naprosyn
Femara
Depakote
Buy Kytril
Purchase Norvasc
Cheap Snoroff
Purchase Percocet
Cheap Accutane
Cheap Adalat
Cheap Brite
Epivir-HBV
Amoxil
Order Vantin
Order Karela
Buy Zestril
Order Oxytrol
Cheap Purim
Purchase Atrovent
Order Naprosyn
Buy Zovirax
Cheap Lariam
Order Leukeran
Buy Lexapro
Order Rogaine
Buy Flexeril
Buy Methocarbam
Purchase Mysoline
Purchase Starlix
Purchase Trandate
Order Snoroff
Order Serevent
Order Prandin
Butalbital
Cheap Motrin
Dilantin
Order Lincocin
Purchase Prometrium
Inderal
Buy Tulasi
Buy Septilin
Purchase Detrol
Cheap Ansaid
Purchase Accutane
Order Lorazepam
Buy Rhinocort
Purchase Isoptin
Purchase Singulair
Cheap Cozaar
Purchase Zebeta
Cheap Procardia
Purchase Clarinex
Buy Reosto
Purchase Cardizem
Order Aceon
Cheap Nicotinell
Cheap Adderall
Cheap Crestor
Cheap Avandamet
Purchase Proventil
Order Superman
Buy Xenacore
Purchase Vantin
Buy Lincocin
Purchase Clomid
Brafix
Propecia
Karela
Cheap Tricor
Proventil
Buy Ativan
Purchase Prozac
Order Danazol
Buy Amaryl
Cheap Trazodone
Purchase Myambutol
Adalat
Cheap Mycelex-G
Cheap Atarax
Buy Himplasia
Cheap V-Gel
Buy Danazol
Norvasc
Order Levitra
Cheap Lozol
Order Paxil
Cheap Glucophage

March 2nd, 2007

Taking SFE beta down for a while

I have had my own email running through the SpamFreeEmail.com system for well over a year now and it has been working great. I’ve watched it learn about my email and automatically start forwarding mail that was good, while I never once got an email message sent through the system that was bad.

I had a few false positives in that some email was marked as spam when it was not, but those were easily enough fixed in the system.

I consider the BETA a success and a full working prototype of what an Erlang based spam filter could be, but it’s not enough :-)

After spending a lot of time on the Netflix Prize, which I was working on in pure Erlang, I have learned so much about the language that I wanted to redesign how SFE was working. Unfortunately for me, I did not have enough hardware to run my beta test and redesign the system at the same time. At least not enough hardware for myself to feel comfortable.

I also have been wanting to implement some of the modules I have been working on in my ErlMail project, which has now become part of the work I am doing at http://erlsoft.org, which involves rewriting most core Internet servers in a way that they fully integrate with each other.

Lastly, I want to redesign SFE to be written in pure Erlang. I had been suing MySQL as a storage engine for most of the data and I truly believe the interaction between the two was slowing the system down. They were communicating perfectly, but the MySQL server was having trouble performing the intense actions I was trying to do.

So the SFE BETA will be down for a while. It will be back up in a largely rebuilt and restructured form and will be better for it :-)

June 2nd, 2006

A Thank you to the spammers

To the past few days I’ve been getting more and more e-mail messages that are forging my own domain name to try and get a read them. So this is just a little thank you sent out to all the spammers who decided to forge my domain name, of which I am the only person that has an e-mail address, thinking that some random string of characters will get me to read the e-mail message.

Now of course this has been getting caught in my quarantine since the messages can’t get past the rest of the filters at spam free e-mail. A decent number of these spam and messages have been getting caught and placed in my bad list, although none of them have managed to make it into my actual e-mail in-box.

The reason I’m thanking spammers for this barrage of messages forging my own domain name is that I needed some inspiration of late, and they have now provided it. After looking at the messages I have come to the conclusion that they would have very easily failed in the SPF test. I had placed creating my own SPF filter on the back burner for while; I have almost everything in place for it except the actual logic to do the IP address checking.

Now thanks to a litany of messages to which I know for a fact have not come from my own domain name, or my e-mail servers for that matter I have been annoyed into action. It may not be today or tomorrow but definitely this has moved to the top of my priority list. So hopefully by the end of this week or next I will have my SPF filter in place.

May 17th, 2006

Current Status

It’s been awhile since I’ve done any blogging on this site. In fact, it’s been awhile since I’ve done the work on spam free e-mail. Because of a combination of doing client work and some hardware problems it’s been difficult to actually do anything lately.

I’ve finally come close to a decision that before I can really launch spam free e-mail as a production service I need to purchase a significant amount of hardware to support it. While the hardware that I have right now is doing the job, and doing quite well, its reliability is something that I would not trust in a product that I want to sell to someone else.

I’m currently running the system of combination of a few Pentium III’s and a couple of Pentium IV’s which are working just fine. The problem area seems to be my storage solution.

I’m using an old snap server 12,000 that I picked up off of eBay which I had hoped would be stable enough to at least start the production service. In its current configuration I have nearly one half of terabyte of hard drive storage, which would be fine for the total amount of storage the real problem running into is speed and keeping the server online. I’ve been having problems with hard drives failing which seemed to be more a problem with the chassis then the drives.

I really don’t think the problem is with the choice of hardware by more than the actual device that I got off of eBay was being sold on eBay reason.

I have a few more things I can try on it before I completely give up, including trying out a completely different snap server 12,000 chassis that I have. Unfortunately I also purchased that one off of eBay and I’m not convinced that it will perform any better.

If and when I do finally decided that the snap servers will not perform well enough for a production service I’m going to be looking into purchasing a brand new snap server. I’ve been using snap servers in my own systems as well as for clients for many years and I believe they’re one of the best solutions to storage needs on the market and what they said my only complaint is that I believe the used ones that purchased from a performing as well as they could.

Of course, given the opportunity I would like to get new servers as well. I don’t see this as much of a stumbling block as the storage reliability requirements, since the servers are performing their job well where the storage system is failing on a very random basis. The only real reason to get new servers would be to speed up the amount of time that it takes to process each message which is currently between three and five seconds. Even with new servers I’m not sure how much faster it would be.

April 13th, 2006

Re-scoring and Rescanning

One of the interesting things that I have noticed while watching my latest corpus grow is the changes that are made in the score of message that are originally scored to be neutral or unknown.

I think it would be nice to recheck messages on some sort of interval and get new er scores, adjusting only neutral messages to make them either good or bad based on a newer corpus then the one they were originally scored with.

During this process it occurred to me the virus scan has some of the same issues. Most notably that the anti-virus database changes at least once a day and the last thing anyone wants is to have a virus get into their email box.

So along with re-scoring messages I am also rescanning them for viruses as well.

Either way it keep the database a bit cleaner and help reduce the user interaction with their neutral messages. Given enough newer messages in the corpus any and all neutral message will, in time, be classified either good or bad.

April 5th, 2006

ERLMail.com

Recently I registered the domain name http://www.erlmail.com with the intention of using it to create a free open source email server for Erlang.

Currently it is pointing to SpamFreeEmail.com, mostly because this will be the original code base for the project.

My intention is to create a complete SMTP compliant server with IMAP4 and POP3 support. I want it to be easily distributed and highly concurrent, hence the choice of Erlang.

I have never even participated in an open source project before, so running on will be quite a learning curve for me :-)

I have the core of the SMTP server and much of the queuing processes already designed for SFE. A decent amount of code will need to be removed from SFE as it pertains exclusively to how I have my system configured, although some of it may simply get put into configuration files.

I’m hoping that I can get the core of a server up and running within a month or so. I still need to continue developing SFE, but I think developing them concurrently will improve them both.

So if you are interested in help, let me know. If you are interested in using the final project, let me know. Either way the more interest in the project the better and faster it will get done.

March 29th, 2006

Anti-Virus Stats

After doing the SPF stats at http://spf.spamfreeemail.com I kind of got into the mood of doing more stats pages. I have a few more planned, but I have the AntiVirus page done well enough to show people.

Of the message I have gotten viruses account for about 0.30% of all that messages. Granted we haven’t had a good outbreak of a hard hitting virus in the past few weeks, but that number still seems pretty low to me.

You can find the AntiVirus stats page at http://antivirus.spamfreeemail.com

I think my next stats page will be the country of origin for spam page.

March 23rd, 2006

Training the Corpus

Few days ago I had yet another hard drive go out on me. This time I was much better prepared then the previous time, I actually had a backup from the previous night at 11:30. So no code lost and the system was really only down for a few hours, mostly because I was deciding what direction to take the hardware and where to spend money to put in a permanent fix.

I still had the corpus (read: database of words for spam) but I no longer had any of the messages to look back at. In the end I decided to clear out my corpus and watch how well the system gets trained with my email and some of the training idea that I have implemented in the code.

This is something that I haven’t talked much about, but it is integral to the was this system is working. ”I use as much data as I can to learn what is good and what is bad and then fine tune that knowledge into multiple levels of personalization.” As each message is broken up into it’s parts the system decides what level of personalization to use, and I’d built a system that is capable for doing that in under three seconds per message.

By adding more information about each of the messages, like I am thinking about doing Yahoo! DomainKey as a test, the system learns better, faster and get the fine tuned personalization to be faster and more accurate.

Some of you may be reading this and assuming that this is the way that bayesian filters work, but in this case I have changed it enough that the personalization is now a much larger feature.

So now, on day three of training the new corpus, I got my first good message. Out of 1,650 message, 304 have been marked bad and 1 is now marked good. Many of these message are truly good or bad, but the great exciting news is that the system is learning how to tell the difference and it is doing it well.

Each good message and each bad message redefines how the system defined the tokens in a message, but more importantly is also redefines how each users personalization works. So not only do you get the intelligence of the entire system, but you get personalization built in as well.

I’ve been purposefully vague and left out many details as the are the secret sauce of the service and they change on a daily basis. The end result will be that newer types of spam message will get marked faster and one person’s personal preference will not greatly effect how another person’s email is classified in the long run.

I’m excited about this and I can’t wait to see how fast the filter will start to see the patterns in the chaos that I can’t see my self :-)

Note: 3 more messages have been classified as good since I started writing this blog entry …

March 9th, 2006

SFE Status update

After I figured out what was causing the problems with multiple messages slowing each other down on the system things have been working wonderfully. So well in fact that I don’t see the need to replace the current hardware in the near future. I’m hoping that adding more hardware will be necessary, but that will only be to handle additional loads.

There are currently 5 permanent servers running SFE, three dual PII’s and two P4 with lots of RAM between them. The database server that runs SFE and many of my other sites is a Dual P4 Xeon with 4GB of RAM. With this current configuration my statics say that I could process more then 450 times the spam messages that I currently am each day and I suspect that number will hold true for a while.

Thanks to Elang, it is easy to add more nodes into the system, it can be done in less then 15 minutes per server. In fact each server can easily handle 6 nodes, possible more if I wanted to. Each node is capable of handling more then 13,500 messages a day and that number is only going up for the next week or so.

I still have a few things that I need to work out before letting too many more people onto SFE. I am hoping to have the major issues resolved soon and then just work on the small or more cosmetic issues.

My current goal is to start letting people sign up for themselves to be part of the extended BETA testing (which will be free) before the first of April.

The Extended BETA test will still have the system working out some bugs and the bigger thing is that I will need more samples of good email messages. Once I see the neutral message accounting for less them 5% (It is currently just less then 20%) of the total message for more then 1 week then I will feel confidant that the system is good enough to charge money for :-)

March 8th, 2006

Tagging messages for the filters

One of the pieces of information that I have available to me for every message is what country the IP address is registered to. I have been trying to figure out exactly how it incorporate this into a filter while still giving the benefit of the doubt to messages that are going to be good for that country.

The idea that I just implemented was to add the name of the country into the informational live in the header that I tag on each message. This is part of the SMTP standard, although adding the name of the country is something I have never heard of any one else doing before.

There are two things that this does, first this allows the person reading any header to easily see what country the message is from based on my IP look up, but more importantly this will give more words of the bayesian filters to work with.

The basic idea came to me while I was looking at some message that were in an inbox that is 100% spam. I was watching the score of each individual word as it was running through the bayesian filter. One thing that stood out was that since every message was spam certain words that were in the headers were considered spam markers. These words would have been normally considered neutral, but since every instance was in a spam message they were considered spam.

This gave met he idea that is in the header the country of origin was explicitly stated, then the countries that only send spam to you will give an additional marker for spam. Countries that send both good and bad mail will end up with an extra neutral marker, which tends to not adjust the combined statics very much. And I’d be hard pressed to think of a country that only sends good mail, so well leave that case alone.

This, of course, will likely not be the only deciding factor if a message is spam or not, but every piece of information is helpful.