Running a web site in 2014

I recently did some work on the back end of my web sites. I consolidated all the individual WordPress installs into a single multi-user one, cleaned up the database to free up disk space, and slimmed down the number of plugins. I’m taking advantage of Automattic’s Jetpack plugin to provide functionality that previously required a bunch of third party plugins, including:

  • Markdown support (including in comments)
  • “Like” buttons for social network sharing
  • Mobile device support
  • Push notifications when someone comments
  • Comment login via social networks
  • E-mail subscriptions

It wasn’t long before I got some mild negative feedback: My changing the login system meant that some comments got flagged as spam which shouldn’t have been, so I had to go in and unflag them.

It is, of course, a pain when you write some carefully thought-out comment, only to have the system apparently drop it into a black hole. I understand that, as I have that experience myself on other sites. However, if you don’t run your own web site, you might not be aware of why people like me have such zealous spam filtering. So, let me pull back the curtain a little.

The main anti-spam tool I use is Akismet, which detects spam by aggregating comments across tens of thousands of web sites and looking for patterns. It gives me statistics on how many spam comments it has blocked. In January, for example, it caught 189 attempts at posting spam comments. Half a dozen every day. That doesn’t sound too bad, right?

Urban planners and other designers have a concept known as affordance. It refers to the way a thing can have a design which encourages or discourages particular kinds of use. In the specific case of urban planning, affordance is used to refer to the graffiti-attracting or graffiti-discouraging properties of particular materials and structures. For example, the classic UK bus shelter of the 1970s—a wooden shed—had a very high affordance for graffiti. Modern bus shelters have glass walls in order to make them less attractive targets for defacement.

One of the interesting things about graffiti affordance that I learned from a book on urban planning is that once a single piece of graffiti appears on a surface, the surface has a much bigger affordance for more graffiti. The electronic vandalism of spam works in exactly the same way: If your web site has comment spam all over it, it will become an attractive target for more comment spam. Not only do spammers use search engines to find spam-infested web sites and post more spam to them, they also make money by selling each other lists of potential victims.

Spam levels also rise and fall based on the time of year (there’s a rise before Christmas), and they can be changed dramatically when security teams manage to take down a botnet. As it happens, my January total of 189 blocked spam comments was on the low end of the range. In August, I had 1,574 spam attempts. Fifty a day. That is more like what I’d be facing if I turned off the filtering.

99% of spam comments use invalid e-mail addresses, so for a while I figured I’d just use a confirmation e-mail plugin, and require users to click a link in the e-mail to confirm that they were actual humans and post their first comment. After that, they could post freely. No actual humans would be inconvenienced that way, right?

Unfortunately, doing that and turning off other user filters resulted in thousands of garbage user accounts every week. I was paying for the database space used to store the garbage, I was dealing with the bounce messages from the confirmation messages—and after all that, I was still getting actual spam, because thanks to globalization you can pay someone in India or China a pittance to spend the day posting carefully targeted manual spam to web sites that show up in your favorite Google searches.

So, I use a plugin called Stop Spammer Registrations to stop the user registration spam. It checks IP addresses and e-mail addresses against blocklists maintained by various web communities. It also looks for suspicious behavior, like invalid or missing HTTP headers, extremely long or short usernames, and so on. It’s not perfect, and has misflagged a couple of people, but I hope that now that I’ve explained the scale of the problem, you understand why it’s necessary.

I’m also trying my best to make things painless by allowing login using social network accounts you already have, rather than requiring that you set up yet another username and password. If you don’t mind using a social network login, that’s actually the best bet for commenting, as it avoids the need to scrutinize a registration request and lets you proceed straight to the actual commenting stage. Also, once you have an approved comment, you shouldn’t get filtered again so long as you use the same login method. If a comment does go into the moderation queue, rest assured I will get to it, and I’m now set up so that I can moderate comments on my phone as soon as I get the push notification.

(At this point, it’s worth mentioning that just because you see something get posted, doesn’t mean I’m actually using the Internet. I use WordPress’s post scheduling system and maintain a queue of items, so that I can keep a reasonably regular update schedule, currently around 3 items per day.)

So in summary, I’m genuinely sorry if you’re inconvenienced, but spam filtering on both user registration and comment posting is a necessary part of running a web site that’s been around for 20 years and is on every spammers’ radar. I wish it wasn’t, and I’m open to suggestions for better technologies, but for the time being I have something that seems to be working after some initial teething troubles.

Of course, lots of people are just turning off comments entirely, but in the absence of adoption of an open social network to host discussion (hint hint), I’m not comfortable doing that.