Dec 19

Nice to see that the developers of Perl are still solving the important problems:

say() is a new built-in, only available when use feature 'say' is in effect, that is similar to print(), but that implicitly appends a newline to the printed string.

A new prototype character has been added. _ is equivalent to $ but defaults to $_ if the corresponding argument isn’t supplied.

perldelta for 5.10

This is exactly the kind of stuff that made me give up on Perl. And mro is a pretty horrible new feature too.

Mar 02

Last night I wrote my first program in Ruby. So far I like it, a lot.

I’d been intending to learn something better than Perl for a long time. Perl is very useful, and with CPAN you can get a great deal done in very little time. (Thanks, Jarkko—how’s it going?) However, it’s a really crufty and syntaxy language, and the way it supports object-oriented programming is so butt-ugly and riddled with pitfalls that I had never bothered to get to grips with it enough to write my own classes. People complain about regular expression code in Perl being write-only, but to my mind it’s the OO code that’s the real problem. Not just that it’s ugly; as the tutorial says, “Yes, this is a lot of work”. Using OO shouldn’t be a lot of work, that’s the whole point of it!

I think that the Perl motto of “There’s more than one way to do it” has become something of a convenient excuse for not making difficult decisions. I don’t need two or three different syntaxes for method invocation, particularly not if they’re all ugly. I don’t need a variety of different types of scoping, and any language where the simplest and cleanest scoping options still require a lengthy tutorial needs to be taken out and shot.

I read the Apocalypses. I wanted to know if Perl 6 was going to become the language I wanted, in much the same way that Perl 5 had become a tolerable version of Perl 4. It’s clear that Larry Wall understands how awful Perl’s OO system is, in that he adds a whole bunch of fundamental problems to my rather superficial ones. Unfortunately, the proposed new clean OO for Perl 6 looks to have a lot of the ugly and arbitrary syntax that is a familiar feature of Perl—like $.x for a public variable and $:y for a private one, *%_ for a variable list of arguments, all the way up to abominations like %::*::($package)::method. And still “There are too many ways to do it”, for example in regular expressions you have a whole new syntax, plus you can use m:p5/ and use all the old Perl 5 syntax. Or hey, why not use both?

And Perl 6 isn’t even out yet. Who knows what kinds of compromises will be made before it becomes real? And it’s clearly going to have a major learning curve for Perl 5 users, so much so that some people are grumpily saying they’ll stick with Perl 5. Me, I’m always ready to learn a new language—but if I’m going to spend a few months of my time learning Yet Another Programming Language, I’d like it to be one that’s (smaller|simpler|easier|more consistent) than the ones I know already, not one that needs a big wall chart to summarize its operator precedence rules.

Ruby isn’t perfect, of course. I wish it had support for variable declaration, with an equivalent of use strict. Unfortunately, the Ruby zealots seem to have an unjustified religious objection to allowing variable declaration, and I was bitten by the lack of declarations twice last night alone. Still, I was writing proper OO code with recursion inside a couple of hours, and that has to count for something.

The other two big problems with Ruby are performance, and Unicode. It kinda lacks both. However, once Parrot has matured a bit, Cardinal should solve both problems at once.

Sep 01

I wrote some Perl, and cleaned out all the links to LiveJournal threads from my journal. If you posted comments in response to anything in my journal, sorry, they’re gone, you know why. I also replaced all the LJ usernames and journal and community links; damned if I’m going to give them any free (good) publicity.

People have asked, but no, I’m not going to waste any more effort trying to get the LJ management to clean up their act. I wrote them off as a hopeless cause 2 weeks ago. Besides, I would rather spend my time and effort writing interesting and amusing content.

On a more positive note, I’ve finally learned how to make screen display two terminal windows in one, er, terminal window, rather than just letting you switch to ‘previous screen’ and ‘next screen’. Amazingly, I’d used the program for years without learning that fairly simple piece of knowledge. Next: how to mark text in visual mode in vi

Finally for tonight, a household hint: It is very difficult to kill moths by throwing water at them.

Aug 15

Apparently there are some people still falling for that “freeipod.com” pyramid scheme. I posted a pretty skeptical analysis last month, but TrollJournal ate it. I thought the whole pyramid would have collapsed by now, but it seems not. So, let’s repeat the analysis…

Let’s try to give freeipod.com the benefit of the doubt, and be optimistic in our analysis.

First off, note that every time someone goes to the site and registers directly, rather than being referred there, nobody gets credit for that new member, so existing members are less likely to get their free iPods. So, let’s assume that the only way people ever join is by being referred, hence maximizing the chance that people who sign up will get an iPod.

Next, note that there is clearly a finite world population. Once the necessary referrals have signed up and won you your free iPod, any additional referrals are just reducing the number of people remaining who might sign up, and again reducing the chances of the people who have already signed up finding enough new members to refer. So let’s also assume that nobody collects more than the minimum number of referrals required, which is 5.

Now we need to estimate how long it takes someone to find 5 referrals. I’m gonna say 24 hours, partly because it seems like a reasonable number, and partly because it makes the mathematics really easy. Everyone knows at least 5 people who read their e-mail more than once a day, right?

Our final assumption is that it doesn’t get harder to find referrals as time goes on. This is a ridiculous assumption, but hey, we’re trying to be optimists, right?

Enough assumptions. The nugget of data we need is when the whole scheme started. I did some searching on Google, and found public postings about the site dated July 19th.

So, we have our algorithm: we start July 19th with 1 member. Each day, each member who hasn’t won an iPod (i.e. those who joined the previous day) finds 5 new members, and becomes eligible for a free iPod. The next day, each of those new members will find 5 more new members so they can get their free iPod, and so on.

I wrote a quick Perl program to compute the results.

Jul 19:
 Site has 1 members.
 The 1 most recent members find 5 more.
 Apple has shipped 1 iPods.

Jul 20:
 Site has 6 members.
 The 5 most recent members find 25 more.
 Apple has shipped 6 iPods.

Jul 21:
 Site has 31 members.
 The 25 most recent members find 125 more.
 Apple has shipped 31 iPods.

Jul 22:
 Site has 156 members.
 The 125 most recent members find 625 more.
 Apple has shipped 156 iPods.

Jul 23:
 Site has 781 members.
 The 625 most recent members find 3,125 more.
 Apple has shipped 781 iPods.

Jul 24:
 Site has 3,906 members.
 The 3,125 most recent members find 15,625 more.
 Apple has shipped 3,906 iPods.

Jul 25:
 Site has 19,531 members.
 The 15,625 most recent members find 78,125 more.
 Apple has shipped 19,531 iPods.

Jul 26:
 Site has 97,656 members.
 The 78,125 most recent members find 390,625 more.
 Apple has shipped 97,656 iPods.

Jul 27:
 Site has 488,281 members.
 The 390,625 most recent members find 1,953,125 more.
 Apple has shipped 488,281 iPods.

Jul 28:
 Site has 2,441,406 members.
 The 1,953,125 most recent members find 9,765,625 more.
 Apple has shipped 2,441,406 iPods.

Jul 29:
 Site has 12,207,031 members.
 The 9,765,625 most recent members find 48,828,125 more.
 Apple has shipped 12,207,031 iPods.

Jul 30:
 Site has 61,035,156 members.
 The 48,828,125 most recent members find 244,140,625 more.
 Apple has shipped 61,035,156 iPods.

Jul 31:
 Site has 305,175,781 members.
 The 244,140,625 most recent members find 1,220,703,125 more.
 Apple has shipped 305,175,781 iPods.

Aug 1:
 Site has 1,525,878,906 members.
 The 1,220,703,125 most recent members find 6,103,515,625 more.
 Apple has shipped 1,525,878,906 iPods.

Aug 2:
 Site has 7,629,394,531 members.
 The 6,103,515,625 most recent members would need to find 30,517,578,125 more.
 Apple has shipped 7,629,394,531 iPods.

Everyone on the planet now has an iPod.

So there we have it. If we set our assumptions to maximize your chances of winning an iPod, everyone on the planet already has an iPod.

Of course, the nice thing about having code to crank out the numbers is that I can now fiddle with the assumptions. If we assume it takes everyone two days to find 5 more members, then the remaining population of the earth got their iPods today, August 16th. So, if you didn’t get yours, don’t panic, it’s probably in the mail.

Update 2004-09-24

Forbes is reporting that people are suing freeipods.com for not shipping them the free iPod they qualified for. The company claims Apple simply isn’t shipping them the thousands of iPods they’ve ordered, and that people will get their free iPods real soon now, honest.

Oh, and the lucky suckerscustomers of the service also say they’re being inundated with spam.

I’m shocked to find out from Forbes that this may not be a legitimate business operation. Shocked, I tell you.

Mar 14

For a while now I’ve been plagued by mysterious e-mail sync problems. I’d read something and delete it on one machine, and then I’d log in on another machine and it would be back. This wouldn’t be unexpected, except that I use IMAP for my mail, which is supposed to fix such problems. I eventually deduced that the real problem, which IMAP had been unable to solve, was that mail was being held on the server in mbox format.

For those who don’t know, mbox format is the standard historical way of keeping e-mail on a UNIX system. It’s a plain text file consisting of all the messages one after the next, with a line that looks like From <foo@bar.com> Sun Mar 14 13:28:33 2004 marking the start of each new message.

There’s a fairly obvious problem with that format: if someone happened to use the word “From” followed by a space at the start of a line in the body of an e-mail, the software got confused. Rather than fix the problem by replacing mbox with a well-designed storage method, the people who wrote early mail server software decided to kludge around it. So, if you use the word “From” at the start of the line, it gets silently changed to “>From” before the mail is delivered. That breaks digital signatures, hence requiring even more kludges.

Since each folder full of mail is a single file, at any given moment only one piece of software can be allowed to be updating the file. This is obviously a problem if you leave your home Mac running Mail, then try to access the same mailboxes from your iBook while sitting in Starbucks. It’s also a problem if you get a lot of mail, because mail can’t be delivered to a mailbox while your mail program is updating it—say, to download new messages or delete messages you’ve read.

Another issue I’d been facing was performance. Apple Mail has a nice UI, but when you ask it to use IMAP it has a habit of opening five or more TCP connections to the remote server simultaneously, and then updating all your mail folders at once. Hence, you wait five times as long before you get to see the contents of your inbox. Also, you’re five times as likely to clash with another copy of Mail accessing the same mailboxes, or clash with the system trying to deliver new mail to you.

There’s a whole side-rant I could go into here, about file locking and how many UNIX systems suck at it. Let’s stick to complaining about mail, though.

There is a solution to the mbox disaster. It’s called Maildir. It requires no file locking; any number of programs can access the same mailbox at the same time. It only has one downside: each message is stored in a separate file. If you don’t understand why that would be a problem, well, that’s another side-rant. But since I have a Linux box using ReiserFS, lots of small files aren’t an issue. So I decided to make the MP3 server also work as a mail server. The requirements were simple:

  1. Get mail from one or more IMAP or POP3 servers.
  2. Filter mailing list traffic into one or more Maildir folders.
  3. Serve up the mail via IMAP to any machine in the house.

I expected part 3 would be the tough part, so I decided to tackle it first. I shopped around for an IMAP server, and found Binc IMAP. This looked like the best bet because it was by far the smallest IMAP server, a mere 407KiB, and because it had sensible project goals. And indeed, it compiled and installed easily, and worked.

That dealt with, I assumed parts 1 and 2 would be simple.

Most people use fetchmail for part 1. It’s a program by Eric S. Raymond, the fruitcake who thinks that the solution to 9/11 is that everyone should have been carrying a gun on the plane. Unfortunately, fetchmail has had quite a few security problems which ESR’s gun collection doesn’t appear to have prevented. Worse, it works by shoving each message it fetches into a mail server via SMTP. This is bad for performance reasons, and also bad because it makes it all too easy to end up generating bounces instead of delivering mail.

The main alternative to fetchmail is getmail. It works, but unfortunately it doesn’t do filtering; you need another program for that.

The most common filtering program is procmail. Unfortunately, procmail doesn’t support Maildir. So, generally people who use Maildir either use a copy of procmail that’s been bodged to work with Maildir, or make procmail use yet another program to actually store the mail. So you end up with three different programs glued together with shell scripts and pipes, and if any of the three goes wrong you can end up losing or bouncing mail. Been there, done that, not keen on doing it again.

Instead, I tried maildrop. It’s part of the Courier MTA, but also available as a stand-alone program. It’s 617KiB compressed for distribution, and it’s written in C++, yet all it does is filter mail from standard input based on headers, and and deliver it by writing a file in a directory. The bloat should have been a warning sign, but I went ahead and installed it.

The first problem I hit was that maildrop would randomly report an error on some of my mailboxes, saying “Unable to create a dot-lock”. Since (a) Linux has proper file locking support and (b) I was only using Maildir folders, there was absolutely no way I ought to have been getting that error. I downloaded the latest version of maildrop, hand-configured it to disable all file locking code, recompiled and reinstalled.

Now it reported that it couldn’t open the folders. Why? I have no idea, because the shitty thing would simply report:

Unable to open mailbox.

No indication of which mailbox, or why it couldn’t open it. Nothing that might actually help me track down the problem. Apparently the guy who wrote the thing is unaware of the purpose of error reporting.

I read the sketchy man page, and tried using -V to increase the verbosity of the error reporting. Now instead of coughing up a completely useless error message, it locked up in an infinite loop. So, it’s safe to say I won’t be touching the Courier MTA again in the near future; if it can’t even open a Maildir and deliver a single e-mail reliably, what must the rest of it be like?

I tossed the piece of crap and searched some more, but I was unable to find a single program which would fetch mail from one or more remote servers, perform a few trivial filtering operations, and deliver the mail to some Maildir folders. So, if anyone wants to demonstrate their programming prowess, there’s a project idea for you.

In the mean time, I needed a solution. Once again, it came down to the age-old saying: “If you want a job done properly, do it yourself.”

I fired up Perl and installed Mail::Box from CPAN. A while later I had a 3.6KiB Perl program which would call out to any number of POP3 or IMAP servers, pull down all the mail, sort it based on the headers, and file it in the appropriate Maildir folders. No MTA involved, no header rewriting. As an added bonus, it automatically removed duplicate messages.

I briefly considered generalizing the solution, but realized a few things. Firstly, the entire program was less than 1KiB larger than my old procmail configuration file, and was about as readable, so why bother? Secondly, the whole thing ran in compiled bytecode, and generalizing it would make it slower. Thirdly, I already knew Perl syntax, so what was the point of inventing something else that would be less powerful?

(This is, in fact, the Lisp way of thinking. Rather than dicking around with parsers, you build some Lisp functions which then let you express the problem naturally in Lisp expressions. Well, in this case Perl, but it’s the same principle.)

So once again, we see that a simple, everyday problem is much more difficult to solve than it ought to be, because of the crapulence of so much of the software that people use. I can’t believe I’m the only person who wants a simple, reliable fetch-filter-deliver mail utility.

Nov 20

Slashdot Q&A

, Comments Off

Problem: Write a program which takes a string on standard input, and reports the most frequently occurring characters in the string.

Solution:

#!/usr/bin/perl -w @c=split /[\n.]*/,<>;foreach$c(@c){$n{$c}=$n{$c}?$n{$c}+1:1;} foreach$x(sort{$n{$b}<=>$n{$a}}keys%n){print"$n{$x} x $x\n"};

I just couldn’t stop myself…

Nov 04
  1. Shoddy workmanship. RPM was discovered to be broken in 2002; it would regularly corrupt its own databases and lock up in such a way that it couldn’t be killed. In spite of that, RedHat went and made two major releases with a broken RPM.

  2. Bad packaging. The RedHat 8 release of libgcj (the libraries for the GNU Java compiler) puts a broken version of jar into /usr/bin, destroying any working version you have installed. They’ve known about the problem since 2002, but left it broken into 2003. What the hell is a jar binary doing in a libraries package anyway?

  3. RPM. If you think RPM is a reasonable piece of software design, I can only assume you’ve never used Portage, APT or BSD ports. I still have to keep a “cheat sheet” of the bizarre invocations necessary to make RPM perform basic tasks. Then there’s the fact that you can’t just install an RPM; no, you need to find the right RPM for your specific version of RedHat, assuming one exists. And if that’s a pain for you, imagine what a pain it is for developers.

  4. Broken SMP. Not a big deal for home users, but RedHat target the enterprise. Yet threading is broken on SMP systems.

  5. Bad advocacy. Standing up and telling people that they should run Windows, not RedHat, is not just a dumb move from a marketing point of view—it also infuriates the very developers they rely on for the products they repackage and sell.

  6. Broken UTF-8. Try echo redhat sucks | grep [A-Z] and see. Works on every other Linux distribution, broken on RedHat. Furthermore, man pages show up as garbage in PuTTY because nroff’s VT102 sequences get hosed. It’s possible to fix it by turning off Unicode support, but is it too much to expect that RedHat get basic stuff like terminal I/O right?

  7. X dependencies. RedHat is for servers, right? Who needs X on a server? Well, you do, if you want to run RedHat’s emacs-nox package. Want to install without X? Tough luck, their “text mode” installation still uses it.

  8. Broken Perl. Perl has its own package management system called CPAN. It’s a hell of a lot more friendly, easy to use and helpful than RPM. I do not want to manage my Perl packages with RPM. However, even if I did, I’d be out of luck, because RedHat don’t provide a complete set of CPAN libraries in RPM format, and the ones they do provide aren’t kept up-to-date. End result: Before installing any Perl library I have to check to see if there’s an RPM version. If there is, I install it and then delete it manually and install it properly using CPAN.

  9. Painful upgrades. Want to upgrade from RedHat 8 to RedHat 9? Good luck, you’ll need it. Even upgrading from 7 to 8 is a “start from scratch” affair. Compare to upgrading debian or Gentoo and it’s clear that RedHat is designed for people who never upgrade. Which wouldn’t be a problem, except that RedHat don’t support old releases any more. If you’ve got an old RedHat system, you have to reinstall anyway, so do yourself a favor and install a different distribution.

  10. Unnecessary software. When I install RedHat, I always do my best to cut down the bloat to a bare minimum. Yet somehow, I always seem to end up with iptables and ipchains. Now, in what universe does it make sense for your installer to install both of those?

Oct 14

…though you’d never guess from looking at my desk.

After cleaning the house at the weekend, I bought some financial management software. I now have a set of accounts for all my various assets, UK and US. Mmm, double-entry bookkeeping. Brings back some memories… Previously I used a spreadsheet and some Palm software. It kinda did the trick for answering questions at tax time, but reconciling accounts was a pain.

One of the things that nobody seems to tell you about double-entry accounting is that, to borrow a phrase from the Perl community, there’s more than one way to do it. Maybe you want to handle your rental property using split transactions in your checking account ledger, or maybe you want to have a separate ledger and post transfers to tracking accounts from that. Sure, accountants will have opinions about how to best set out your books; but then, programmers have opinions about how to write code, and it doesn’t mean they’re right.

The other big secret—well, maybe you’ve guessed it already—is that it’s much simpler than the professionals make it sound.

Dec 04

Someone at work wanted a Microsoft tool for producing large files of particular sizes, to fill up a disk and stress-test applications under low disk space conditions.

So I wrote one in Perl in a few minutes, to get my brain started for the week. For kicks, I gave it built-in help, and made it understand sizes specified in both base 10 and base 2 sizes—e.g. stresstest 10MiB 55MB 1GiB to create three files of the appropriate sizes.

[For those who don’t know: 1kB is 1000 bytes; 1KiB (kibibyte) is 1024 bytes. Similarly for mebibytes and gibibytes vs megabytes and gigabytes.]

Nov 28

Got home, booked tickets to Minnesota. It’s funny, when I married Sara I didn’t really think about the fact that it would mean visiting Minnesota every other winter. Not that I’d have decided differently; I’m just amused that it didn’t occur to me.

Also fixed my web site. The Perl script rewrote most the HTML for AT&T’s web servers, but I had to change a few URLs in my LiveJournal template and fix the redirection at pobox.com. It’s great being able to change web host and ISP at the drop of a modem connection.