22 January 2006

The spam problem part 1: Describing the problem

A great many words have been written on the subject of e-mail spam. Effort has been poured into all kinds of technological measures against it. In my view, many of these efforts have been a waste of time, because they have failed to address the fundamental problem of spam.

To explain my thinking, I’ll start with some basic statements:

  1. Your attention is a valuable resource. If you doubt this, you need only look at the amount of money spent on advertising in an attempt to acquire your attention.

  2. Therefore, your inbox is a valuable resource. Many people, perhaps most people, now check e-mail multiple times a day. In fact, according to some surveys college students spend more time on the Internet than watching TV. They check their e-mail inbox more than they look at ad breaks.

  3. SMTP e-mail allows anyone to send mail. There’s no centralized registration required in SMTP; there’s no control over the growth of the SMTP e-mail network. While some servers restrict which SMTP clients may connect to them, there’s essentially no control over who sends mail, as it’s always possible to open a new web e-mail account, buy a new ISP dial-up account, or whatever.

  4. SMTP e-mail is free for the sender. Sure, many people pay for their Internet access; but once you have an Internet connection, sending e-mail basically doesn’t cost you anything—it has marginal cost.

Now, let me re-cast those four statements:

We have unrestricted access for anyone in the world to use arbitrary amounts of a valuable resource.

Can you think of any case where there has been a system like that, and it has worked? I can’t. The canonical example is the tragedy of the commons, but there are plenty of others, including the Cambridge ‘Green Bike’ scheme and the overfishing of cod.

In order to avoid a “tragedy of the commons” situation, we need to alter the situation so that one of the statements above is no longer true. Let’s go through them again and consider our options.

1&2: “Your attention (inbox) is a valuable resource.”

In the early days of spam, negating these statements was a viable strategy. When people were spammed, they would take action against the source of the spam. Accounts would be closed, web sites shut down, server access tightened, and so on. Basically, significant numbers of people would ensure that getting their attention inappropriately carried extremely negative value for the sender. For a while, this balanced out the positive value of the rest of the Internet-using population.

Unfortunately, that approach quickly fell apart as the Internet audience became more diverse, the volume of spam became greater, and the cost of retaliation increased. The value of getting spam into my inbox is still negative, but for every one of me there are ten drooling imbeciles who want to purchase herbal V1A]<G-R4.

3: “SMTP e-mail allows anyone to send mail.”

This is the statement almost everyone is trying to negate at the moment.

Some methods, like SPF, merely wish to put slight brakes on the system and ensure that only people who can get agreement from someone who owns a domain can set up a new e-mail system. More radical anti-spam activists mutter that if only everyone was required to use S/MIME certificates, we wouldn’t have a problem. Such proposals make it more bothersome to send e-mail, but they don’t actually negate the statement, so ultimately they will prove ineffective.

To be fair, SPF and the like would have one useful purpose, if they worked: they would prevent Joe jobs. That would be a good thing—speaking as someone who has been Joe-jobbed—but it would only impact an infinitesimal fraction of the spam messages out there.

More extreme sender-side validation for ensuring that not everyone can send e-mail, such as government-issued “e-mail licenses” that can be revoked if you spam, is just never going to fly with the average Internet user, and is politically impractical on a global Internet.

Filtering and its problems

A real success in negating statement 3 would be to implement a system where only “good guys” could send mail. This is what Bayesian analysis software such as SpamAssassin and CRM114 tries to achieve. Unfortunately, there are major problems with this approach, which I will simply call “filtering”.

The first problem is that filtering only gets to stop the e-mail when it arrives at the destination server. At that point it has already wasted network bandwidth and disk space.

The second problem is that filtering is never perfect. “Never” is a strong word, so I’d better spend some time justifying it.

Not too long ago, many businessmen had highly intelligent filtering systems which operated on all their incoming communications. They called them “secretaries”. These filters were imbued with full human intelligence, a complex understanding of natural language, and the ability to notice subtle cues. Yet in spite of their sophistication, ‘secretaries’ regularly made mistakes. They would bring messages to the boss’s attention, only to have him grumpily reply “Don’t bother me with any more of that nonsense”; or worse, they would fail to notify the boss of an important message.

My definition of spam is that it is defined on a per-user basis: spam, to person X, is whatever person X says it is. That may not seem like a useful definition, but it encompasses most of the others I’ve seen, as well as matching common usage.

While I believe that we will eventually succeed in developing artificial intelligence with human-like levels of understanding, I don’t believe it’s going to be any time soon. Even if it was, I’m doubtful that we will ever produce AI with the kind of mind-reading intelligence necessary to work out whether something is spam to me.

So in the mean time, some non-spam mail gets blocked, and some spam gets through. Which is a big problem, because if you then take into account the marginal cost of sending e-mail, and the fact that filtering only happens at the receiving system, you realize something unpleasant: a filter which is 90% effective, merely means that the senders will spew 10× as much spam. Which means bigger disks, higher capacity network links, and the amount you have to pay to get an Internet connection goes up and up.

The only thing delaying meltdown is the slow rate of deployment of effective filtering. Those of us who are on the leading edge of that technology curve have been privileged to have a mostly spam-free inbox as a result; but as the success rate of filtering on the Internet as a whole goes up, so the volume of spam goes up. Which is why spam is now the majority of all e-mail traffic, according to surveys.

So I think it’s pretty clear that filtering isn’t the solution.

Challenge/response systems

There’s another class of technology people are using to try and get to the blissful state where only “good guys” can send e-mail. They set up a system to reply to incoming e-mail with another e-mail, the “challenge”. This asks the sender to click on a URL if he’s a “good guy”. If he does, his original message is passed on.

Now, there are a ton of technical problems with this approach, ranging from it breaking mailing lists to interoperability between competing “verification” systems. There’s also the problem that we already know that spammers are criminals, so what’s to stop them lying? Plus, sending a response to incoming spam actually confirms your e-mail address to the spammer, resulting in more spam; and you have to send it with your real e-mail address, or else valid recipients won’t know it’s a valid challenge. But I’m going to ignore all those problems and point out a more fundamental one.

The moment sender verification becomes commonplace enough that people start responding to it, you’re going to see a ton of messages like this:

Hi. I recently changed e-mail addresses and wasn’t able to transfer my whitelist from my old ISP, so you’ll have to re-confirm that your message isn’t spam by clicking on the below link…

The link, of course, will take the user to a porn site or a site selling penis enlargement cream.

A few weeks after that, every Bayesian filter out there will have learnt that anything that looks like a verification request, is actually spam. Sender verification will become totally ineffective, because either people won’t receive the requests that they verify, or if they do receive them they won’t want to click on the links.

In fact, if there are any spammers reading this, please go ahead and send out your next few spam runs crafted to look like messages from TDMA. That way we can be rid of this whole stupid C/R nonsense sooner rather than later.

4: “SMTP e-mail is free for the sender.”

So I have concluded that filtering doesn’t work. Or rather, it is impossible to make it work with the level of near-perfect efficiency that would be required to avoid having it simply result in ever larger volumes of spam.

And so we come to the final option, the elephant in the living room, the solution so horrible that people will engage in angry mockery rather than even discuss it: Maybe, just maybe, we need to make people pay to send e-mail.

As a recent New York Times article points out, there’s historical precedent. The postal service originally delivered mail at the recipient’s expense, until abuse of the system made it necessary to introduce postage stamps.

Now, there’s a lot of implementation detail that needs discussing, and I have plenty of ideas around that—my purpose in writing this article was simply to explain how I have reached the following conclusion:

The only way we are going to end e-mail spam is by making senders pay some amount to send each e-mail.

Sure, we can continue the filtering arms race, but ultimately it’s counter-productive. There is only one viable long-term solution to spam, and it’s an economic one.

What form this solution should take, I discuss in part two.

© mathew 2017