9 June 2007

Whatever it takes

Friday was definitely the worst Friday ever.

I wandered in to the office with my coffee, and discovered that my main work laptop—an IBM ThinkPad, obviously—had mysteriously powered itself off overnight, instead of merely going to sleep. I booted it, only to get the dreaded Fan error message.

(If you’re falling asleep already, skip down to the moral of the story.)

A fan error is pretty much the kiss of death for a recent laptop. The quest for ever faster and slimmer portable computers means that today’s portables are designed with fans that suck cooling air through their innards. No fan means the machine overheats as soon as you do anything that strains it a bit; and that could be something as trivial as leaving a web browser running on a Flash-heavy web site, especially if you have Eclipse running in the background.

Still, I have a backup laptop, for exactly this eventuality. I keep it mostly synched up with the main one. I started transferring my recent data across. Before long I was logged in to work via the VPN.

I’d just gotten my first batch of e-mail when I discovered that a clever user had found a way to bypass ACL security and replicate an old, shut down database with a new, in production database. This had wiped a chunk of important configuration data.

I found the backup I could get at most quickly, and did a temporary restore. Then I asked a colleague to pull a more recent backup onto a spare partition of the System i server (aka AS/400), which I then used to do a proper restoration.

I had just about finished documenting what had happened and putting new precautions in place to stop it happening again, when my laptop locked up solid. I suspected the ATI video drivers, so I switched back to the open source ones (which are less buggy) and continued.

Overnight, it locked up again. This was very suspicious. To have Linux lock up once, well, that’s not unheard of when proprietary drivers are involved. But to have it lock up twice, the second time with no closed source software running in the kernel—that smelled fishy.

I ran a Memtest86 diagnostic, and sure enough: bad RAM in my backup laptop. Oh joy. I flipped the machine over and swapped the RAM with the DIMM from the machine with a dead fan. The errors continued. So, it looked like an error in the internal RAM. I took the DIMM out of the RAM slot and ran Memtest86 again. Hypothesis confirmed.

I consulted the handy Hardware Maintenance Manual. It turns out the internal RAM can be replaced too, but you have to remove the keyboard to do it. So, I did that and swapped the internal DIMM. This time Memtest86 still looked good after a couple of minutes, so I powered off, put the second stick of RAM back in, screwed everything back together, and now I have it running an exhaustive test.

Monday, I’ll get the dead laptop and bad RAM shipped to the service department.

The moral of the story: Always buy the extended warranty on a laptop. Even the best ones are significantly less reliable than desktop systems; they are more prone to overheating, and their tiny fans tend to get clogged easily or simply burn out. When something does go wrong, laptop parts are significantly more expensive than desktop parts. Repairs frequently involve motherboard or display module replacement, and can easily cost as much as the machine is worth.

