For a while now I’ve been plagued by mysterious e-mail sync problems. I’d read something and delete it on one machine, and then I’d log in on another machine and it would be back. This wouldn’t be unexpected, except that I use IMAP for my mail, which is supposed to fix such problems. I eventually deduced that the real problem, which IMAP had been unable to solve, was that mail was being held on the server in mbox format.
For those who don’t know, mbox format is the standard historical way of keeping e-mail on a UNIX system. It’s a plain text file consisting of all the messages one after the next, with a line that looks like
From <firstname.lastname@example.org> Sun Mar 14 13:28:33 2004 marking the start of each new message.
There’s a fairly obvious problem with that format: if someone happened to use the word “From” followed by a space at the start of a line in the body of an e-mail, the software got confused. Rather than fix the problem by replacing mbox with a well-designed storage method, the people who wrote early mail server software decided to kludge around it. So, if you use the word “From” at the start of the line, it gets silently changed to “>From” before the mail is delivered. That breaks digital signatures, hence requiring even more kludges.
Since each folder full of mail is a single file, at any given moment only one piece of software can be allowed to be updating the file. This is obviously a problem if you leave your home Mac running Mail, then try to access the same mailboxes from your iBook while sitting in Starbucks. It’s also a problem if you get a lot of mail, because mail can’t be delivered to a mailbox while your mail program is updating it—say, to download new messages or delete messages you’ve read.
Another issue I’d been facing was performance. Apple Mail has a nice UI, but when you ask it to use IMAP it has a habit of opening five or more TCP connections to the remote server simultaneously, and then updating all your mail folders at once. Hence, you wait five times as long before you get to see the contents of your inbox. Also, you’re five times as likely to clash with another copy of Mail accessing the same mailboxes, or clash with the system trying to deliver new mail to you.
There’s a whole side-rant I could go into here, about file locking and how many UNIX systems suck at it. Let’s stick to complaining about mail, though.
There is a solution to the mbox disaster. It’s called Maildir. It requires no file locking; any number of programs can access the same mailbox at the same time. It only has one downside: each message is stored in a separate file. If you don’t understand why that would be a problem, well, that’s another side-rant. But since I have a Linux box using ReiserFS, lots of small files aren’t an issue. So I decided to make the MP3 server also work as a mail server. The requirements were simple:
- Get mail from one or more IMAP or POP3 servers.
- Filter mailing list traffic into one or more Maildir folders.
- Serve up the mail via IMAP to any machine in the house.
I expected part 3 would be the tough part, so I decided to tackle it first. I shopped around for an IMAP server, and found Binc IMAP. This looked like the best bet because it was by far the smallest IMAP server, a mere 407KiB, and because it had sensible project goals. And indeed, it compiled and installed easily, and worked.
That dealt with, I assumed parts 1 and 2 would be simple.
Most people use fetchmail for part 1. It’s a program by Eric S. Raymond, the fruitcake who thinks that the solution to 9/11 is that everyone should have been carrying a gun on the plane. Unfortunately, fetchmail has had quite a few security problems which ESR’s gun collection doesn’t appear to have prevented. Worse, it works by shoving each message it fetches into a mail server via SMTP. This is bad for performance reasons, and also bad because it makes it all too easy to end up generating bounces instead of delivering mail.
The main alternative to fetchmail is getmail. It works, but unfortunately it doesn’t do filtering; you need another program for that.
The most common filtering program is procmail. Unfortunately, procmail doesn’t support Maildir. So, generally people who use Maildir either use a copy of procmail that’s been bodged to work with Maildir, or make procmail use yet another program to actually store the mail. So you end up with three different programs glued together with shell scripts and pipes, and if any of the three goes wrong you can end up losing or bouncing mail. Been there, done that, not keen on doing it again.
Instead, I tried maildrop. It’s part of the Courier MTA, but also available as a stand-alone program. It’s 617KiB compressed for distribution, and it’s written in C++, yet all it does is filter mail from standard input based on headers, and and deliver it by writing a file in a directory. The bloat should have been a warning sign, but I went ahead and installed it.
The first problem I hit was that maildrop would randomly report an error on some of my mailboxes, saying “Unable to create a dot-lock”. Since (a) Linux has proper file locking support and (b) I was only using Maildir folders, there was absolutely no way I ought to have been getting that error. I downloaded the latest version of maildrop, hand-configured it to disable all file locking code, recompiled and reinstalled.
Now it reported that it couldn’t open the folders. Why? I have no idea, because the shitty thing would simply report:
Unable to open mailbox.
No indication of which mailbox, or why it couldn’t open it. Nothing that might actually help me track down the problem. Apparently the guy who wrote the thing is unaware of the purpose of error reporting.
I read the sketchy man page, and tried using -V to increase the verbosity of the error reporting. Now instead of coughing up a completely useless error message, it locked up in an infinite loop. So, it’s safe to say I won’t be touching the Courier MTA again in the near future; if it can’t even open a Maildir and deliver a single e-mail reliably, what must the rest of it be like?
I tossed the piece of crap and searched some more, but I was unable to find a single program which would fetch mail from one or more remote servers, perform a few trivial filtering operations, and deliver the mail to some Maildir folders. So, if anyone wants to demonstrate their programming prowess, there’s a project idea for you.
In the mean time, I needed a solution. Once again, it came down to the age-old saying: “If you want a job done properly, do it yourself.”
I fired up Perl and installed
Mail::Box from CPAN. A while later I had a 3.6KiB Perl program which would call out to any number of POP3 or IMAP servers, pull down all the mail, sort it based on the headers, and file it in the appropriate Maildir folders. No MTA involved, no header rewriting. As an added bonus, it automatically removed duplicate messages.
I briefly considered generalizing the solution, but realized a few things. Firstly, the entire program was less than 1KiB larger than my old procmail configuration file, and was about as readable, so why bother? Secondly, the whole thing ran in compiled bytecode, and generalizing it would make it slower. Thirdly, I already knew Perl syntax, so what was the point of inventing something else that would be less powerful?
(This is, in fact, the Lisp way of thinking. Rather than dicking around with parsers, you build some Lisp functions which then let you express the problem naturally in Lisp expressions. Well, in this case Perl, but it’s the same principle.)
So once again, we see that a simple, everyday problem is much more difficult to solve than it ought to be, because of the crapulence of so much of the software that people use. I can’t believe I’m the only person who wants a simple, reliable fetch-filter-deliver mail utility.