Will new filters save us from spam?

The roughly 500 programmers, researchers, hackers and IT administrators gathered in a chilly classroom on the campus of the Massachusetts Institute of Technology (MIT) Friday aren’t just looking to slow the relentless onslaught of spam — they want to completely destroy its business model.

Their aim is to find a spam filter so effective, that spammers would receive few, if any, responses, making sending unsolicited bulk e-mail a financially prohibitive task.

“Spamming is a business, and the theft efficiency ratio is the same as stealing hubcaps,” said programmer William Yerazunis, speaking at what is thought to be the first Spam Conference ever focused on spam filters.

But the high payoff for sending spam could change if an e-mail filter like the one Yerazunis pioneered becomes widely adopted by large Internet service providers (ISPs).

Yerazunis wrote a language for writing filters based on the Bayesian system which assigns statistical probabilities to whether or not an e-mail is spam. The language is called CRM114, and he wrote a filter program in CRM114 called MailFilter.

At the conference at least, MailFilter was being seen as the great hope for battling the escalating spam problem.

In tests Yerazunis performed, MailFilter was 99.915 percent accurate in identifying spam.

“I’m only 99.84 percent accurate at identifying spam, so this is much more accurate than I am,” Yerazunis quipped.

MailFilter is still in alpha testing, however.

Still, Spam Conference organizer Paul Graham said he was extremely excited about Yerazunis’ solution.

“Bill’s filter looks like the most promising,” Graham said.

Graham himself is a big proponent of filters based on the Bayesian system and he has written his own research report on the subject called “A Plan for Spam.”

His paper, released last August and posted online at http://www.paulgraham.com/spam.html, has generated a lot of discussion within the spam fighting community.

And Graham has written his own filter based on the Bayes system as well.

“I believe in filters because I personally do not have a spam problem,” he said.

Graham added that the idea that filters alone could thwart spam did not get serious discussion until about a year ago. However, both Graham and Yerazunis believe that if there is widespread adoption of filters that are accurate enough to make spamming economically prohibitive, the problem will cease without the need for legislation or other measures.

According to Yerazunis, spam filters need to be at least 99.5 percent accurate to push the cost of sending bulk unsolicited e-mail to about the same as it is to send direct snail mail, making it a far less attractive method for sending solicitations.

The problem, of course, is getting large ISPs like Yahoo Inc., America Online Inc. and Microsoft Corp. to adopt the filters. As it stands now, each ISP is taking its own approach.

Still, representatives from all three companies registered for the conference and showed interest in hearing what new ideas were being batted around.

One of the perennial problems when employing any anti-spam system is deciding what is and what isn’t spam. Whether something should be considered spam is often up to the user, and this makes building and employing filters especially tricky.

“The definition of spam is personal and spam is constantly changing,” said Jason Rennie, an MIT student doing research on adaptive spam filtering.

Spam-fighters are hoping to collect as much spam as possible so they can perform analysis and research on the features that make up spam.

Paul Judge, a representative for e-mail security firm CipherTrust Inc., said that his company is collecting a spam archive for this purpose. Over the last two months the company has collected 250,000 pieces of spam, he said, and is on track to have 1.5 million pieces within the first year.

“Spam messages are starting to look more and more like non-spam messages,” Judge said, adding that analysis is becoming even more important.

While CipherTrust is building its spam archive, Chicago-based programmer Philip Tom was at the conference, handing out we he called “a day of spam” – a disk containing 250,000 spam e-mails.

Tom said that he has an archive of over 50 million spam messages, and receives 250,000 a day from an undisclosed source.

“I want to know what is spam,” he said.

Tom said that most people don’t understand why he is collecting and analyzing spam, but that it provides an interesting project for him.

While he said he might sell the archive for research purposes, he also thinks he might just hand it over “for the greater good” of eliminating spam.

“One thing I can tell you is that spam is growing exponentially,” he said, noting that when he started his archive two years ago he received 10,000 a day, compared to the quarter million spam messages he receives per day now.

The sheer amount of spam has made fighting unsolicited commercial e-mail one of the top goals of the technology industry recently.

But when Graham was asked whether he was planning another Spam Conference, given the success of this one, he said, no.

“Hopefully we will solve this problem and we won’t need another conference,” Graham said.

“I don’t want to be working on the spam problem ten years from now!”

The Spam Conference at MIT runs through 6 p.m. EST Friday. The event is being webcast and a link is provided at the event’s home page at http://spamconference.org.