Spam Free Email

Anti-spam ideas, tools and services

June 23rd, 2006

Is SPF/Sender ID useless?

I read an article today (that I already lost the link to) that was talking about how spammers are using SPF on their throw away domains and domain administrators are using SPF incorrectly. Their conclusion was that SPF or Sender ID was not a good technology for fighting spam.

Personally I think they didn’t get the point. SPF is one technology for fighting spam, not the only technology. If SPF can be used to filter out some email then it will work for what it is designed to do.

As for the people who don not have SPF configured properly or they having users who are not using the authorized server, how is this a problem with the technology. Greater adoption of SPF would eventually root out these problem, as domain admins get reports of problems from their users.

Right now I am getting one type of spam that is driving me crazy, spam from my own domain name that is not originating from my servers. SPF is the perfect technology for this category of spam, where RBLs and bayesian filters are better for other types of spam.

In the end, no one anti-spam technology is going to win the battle. But a toolkit of technologies that work together each solving a distinct part of the problem will stem the tide and again make email the killer app that is was.

June 15th, 2006

No RDNS = Spammer?

In the past few weeks I’ve been running into more and more references that mail servers are being configured to look at the Reverse DNS entries and if there no entry the messages should be considered spam.

Apart from the fact that I currently don’t have an RDNS entry for my own mail server, I an certainly see the logic in this.

Most fly by night mail servers are going to be setup as quickly as possible. They also want as few ways to track information back to themselves as possible. So RDNS is simply something that they won’t take the time to setup.

The only flaw in this logic comes when ISP automatically setup RNDS on all of their IP addresses. Then any mail server on those addresses is automatically immune to this technique of trying to detect them. The hope here is that they will have to move to another mail server soon enough and that they next ISP won’t have set this up.

I suppose this lends credence to the length of time a domain name has been registered as well. With the basic logic being that domain names that are less then say a month old are more likely to send spam then domains names that have been around for years.

This takes into account that spammers many times just buy throw away domains and never bother to renew them once they come up for renewal.

Another thing to look at on the domain side is how long until the domain name expires. If the term is less then one year the domain name would be less important to the person that owns it then a domain name that will expire in two or five or ten years.

So if a domain is less then a month old and will expire in less then a year the likely hood of the domain sending spam message is quite high in my opinion, but that is all it is … an opinion.

[tag]DNS, Reverse DNS, rdns, spam[/tag]

June 2nd, 2006

A Thank you to the spammers

To the past few days I’ve been getting more and more e-mail messages that are forging my own domain name to try and get a read them. So this is just a little thank you sent out to all the spammers who decided to forge my domain name, of which I am the only person that has an e-mail address, thinking that some random string of characters will get me to read the e-mail message.

Now of course this has been getting caught in my quarantine since the messages can’t get past the rest of the filters at spam free e-mail. A decent number of these spam and messages have been getting caught and placed in my bad list, although none of them have managed to make it into my actual e-mail in-box.

The reason I’m thanking spammers for this barrage of messages forging my own domain name is that I needed some inspiration of late, and they have now provided it. After looking at the messages I have come to the conclusion that they would have very easily failed in the SPF test. I had placed creating my own SPF filter on the back burner for while; I have almost everything in place for it except the actual logic to do the IP address checking.

Now thanks to a litany of messages to which I know for a fact have not come from my own domain name, or my e-mail servers for that matter I have been annoyed into action. It may not be today or tomorrow but definitely this has moved to the top of my priority list. So hopefully by the end of this week or next I will have my SPF filter in place.

November 29th, 2005

Do spammers ever quit?

While I am in the process of creating my own anti-spam email server I have been using two of my older domain names as test domains. I retired both domains on April 1st, 2005 and both domains had between 500 and 1,000 active using at the time I retired them.

It is now seven months later and these two domains are still getting spam like you wouldn’t believe.

As part of my server and the statistics that I am planning on keeping I started tracking the unique IP Addresses that are making connections to these two domains. The first day of tracking any I have nearly 1100 unique IP addresses that have made connections, and this is only since midnight.

Not only that but I often see the same IP address making multiple connections in a short period of time.

Of course this just going to and fuel to the argument that spam is a larger problem then people realize. Even those of us who complain about it constantly :-)

November 1st, 2005

What does fighting spam mean to you?

Might be an odd question for most people, but for the system administrator it is a thankless job that only gets harder.

So why wouldn’t every admin use every tool at their disposal?

Some might not know about the tools, but I don’t consider ignorance an excuse.

Others might be implementing new anti-spam protocols but their budgetary constraints and overly complex networks prevent them from doing so. These are more acceptable excuses, but they are still excuses.

Simple technologies, like SPF that require no more then 30 minutes of any administrators time to implement, but they can help reduce spam immeasurably.

I guess I’m on a bit of a rant at the moment, I’ve been developing a new anti-spam email server and I have started looking through the SPF logs. I’m going to start keeping track of this in more detail very soon, but some extremely blatant spam has been coming through and it is obviously not originating from the networks they say they are.

In fact, I am pretty sure this particular case is a virus, but this is definitely something that could be stopped cold in it’s tracks with a few properly configured DNS records and a little more effort on the part of system admins as a whole.

Wonder what will make them take action to prevent the problems in their own houses, instead of just filing the complaints and doing nothing ….

September 11th, 2005

Lisp based DNS resolver library

I’ve been actively working on a Lisp based DNS resolver, what does this have to do with spam you might ask?

I’m planning on creating the ”’Spam Free Email”’ service’s backed with Lisp and as I’ve stated before DNS is one of the most powerful and flexible tools (RBLs and SPF) for fighting spam, so it seemed like the best place to start.

Since I’ve only be programming in Lisp for the better part of two weeks I’m doing well, but currently I’m spending as much time looking up syntax and commands as I am writing the logic. This will pass as I’m more familiar with the language, but all in all I still think this is the right direction to move in.

[tag]Lisp, DNS[/tag]

August 31st, 2005

Automating the spam filter training process

One of the greatest features of modern spam fitlers is the ability to look at the content of messages and identify if it is spam or not based on previous messages that have been categorized as spam.

The largest down fall to this method is that you need a ‘’statistically relevant” number of messages that are classified as either good or bad messages before these filters work well. The problem with this is that you need input from the user to do this or you need to have an over generalized set to start with.

While reading about writing these filters I started to think about how to automate this while still taking the user in the equation.

The idea that I’m toying with right now is to use the users white list to determine good messages and then a combination of RBLs and the users black list to determine bad messages. once I have a statistically relevant number of messages then the filter itself will start to work in conjunction with everything else.

This will train directly on the verbage that is used in the users own messages while only requiring them to create a white list. In the case of false positives and false negatives the users will be able to look through messages for a certain time period and reclassify them as good or bad. This will in effect retrain the filters and prevent the false positive or false negative from recurring.

The process of building each users filters based on their own email messages ‘’should” make their spam filters more effective in the long run, with very little effort in the short run.

August 26th, 2005

Can’t see the spam through the processed meat products …

I’ve been looking at spam messages for years. I’ve been reading articles and essays about spam for years. I’ve been getting spam for years and the only thing that I know for sure is that no one has found the perfect way to prevent spam from getting into my in-box.

My latest thoughts of the problem is that we are looking too hard at the overall problem and at the individual spam message. We are seeing the two ends of the spectrum but we do not have the information to fill in the gaps in-between.

This is part of the reason that I am taking the current approach that I am on dealing with spam. We need more statistical data about why spam messages are spam.

I don’t just want to know that a message triggered an RBL, I want to know every filter it triggers. I don’t want to know that a word somewhere in the entire message was on a block list, I want to know what the word was and if it was in the subject line, one of the headers or in the body and I want to know how many times it was used.

And if a message is on a white list I still want to know what would have happened if it had been processed….

Without this level of information, I do not think we will be able to conceive of the next generation of filters. Once there is a pattern in the chaos, it is simple to filter it out. But first we need a complete picture of the chaos and the tools to find the pattern.

[tag]spam, spam filters[/tag]

August 20th, 2005

The importance of DNS in anti-spam technology

As I’ve been thinking about the different things that I want to check on each email message I keep noticing that most important technology for checking information is DNS. RBLs, SPF and even checking to make sure that the domain name that is sending the email has a MX record or that it even exists all relies on DNS.

Being that the first four or five things that I want to check on each email message relate directly to DNS that means that the DNS server will need to perform well and cache information and the DNS Client that is doing the DNS Queries will also need to perform well.

So I am currently looking into DNS technologies to see which one I am most interested in using.

August 17th, 2005

Email Meta-Data

I’m still working on setting up my development system. It’s taking longer than I wanted, partially because I keep leaving to go do real work :-)

As I’ve been creating this system I’ve started to think about how I want my anti spam solution to be different. When it comes down to it I want two things that I have not seen anywhere else; flexibility and control. I want to be able to know every reason why any particular email message might get blocked as spam and I want the end users to be able to see those reasons as well.

To that end I have decided that each email message needs to collect some data about it and then once all the data is collected the email message will be processed as spam or not. This meta data about each email message will include RBL data, SPF data as well as test data on the content of the message. Each message that passes through the system will collect as much data as possible and it will go through every test, even if it is already considered to be spam.

The reason for this is to better identify and describe what a spam message is and to give the feed back to the users on why a message might give a false positive or pass through the system when it truly is spam.

In my time working with spam filters they have tended to operate as black boxes that give no data back to the end user. The data that some of them do give back is near useless in describing an email message’s properties. I want to solve this part of the problem and give the users the tools to create better spam filters.

If a user sees that a spam message got through the filters, but it might not have gotten through if a new RBL was added, then the user will have the ability to add that RBL.

I also want to create reports to give an idea of exactly what feature is most capable of generating the best combination of filters. Perhaps filters would give fewer false positives if more then 3 RBLs are triggered or an exact combination of two of them. At this point I don’t have that information, but I plan on creating a way to get it.

I also hope to release this information to the public in very generalized reports. Information like the best RBL and how many domains are really using SPF.

In any case, all of this is built on the idea of creating meta data to describe each email message as it passes through this system. So this will be a large potion of the core of this system, which will in the end give the users the flexibility and control that I know I want from my spam filters.