Jun 09

Friday was definitely the worst Friday ever.

I wandered in to the office with my coffee, and discovered that my main work laptop—an IBM ThinkPad, obviously—had mysteriously powered itself off overnight, instead of merely going to sleep. I booted it, only to get the dreaded Fan error message.

(If you’re falling asleep already, skip down to the moral of the story.)

A fan error is pretty much the kiss of death for a recent laptop. The quest for ever faster and slimmer portable computers means that today’s portables are designed with fans that suck cooling air through their innards. No fan means the machine overheats as soon as you do anything that strains it a bit; and that could be something as trivial as leaving a web browser running on a Flash-heavy web site, especially if you have Eclipse running in the background.

Still, I have a backup laptop, for exactly this eventuality. I keep it mostly synched up with the main one. I started transferring my recent data across. Before long I was logged in to work via the VPN.

I’d just gotten my first batch of e-mail when I discovered that a clever user had found a way to bypass ACL security and replicate an old, shut down database with a new, in production database. This had wiped a chunk of important configuration data.

I found the backup I could get at most quickly, and did a temporary restore. Then I asked a colleague to pull a more recent backup onto a spare partition of the System i server (aka AS/400), which I then used to do a proper restoration.

I had just about finished documenting what had happened and putting new precautions in place to stop it happening again, when my laptop locked up solid. I suspected the ATI video drivers, so I switched back to the open source ones (which are less buggy) and continued.

Overnight, it locked up again. This was very suspicious. To have Linux lock up once, well, that’s not unheard of when proprietary drivers are involved. But to have it lock up twice, the second time with no closed source software running in the kernel—that smelled fishy.

I ran a Memtest86 diagnostic, and sure enough: bad RAM in my backup laptop. Oh joy. I flipped the machine over and swapped the RAM with the DIMM from the machine with a dead fan. The errors continued. So, it looked like an error in the internal RAM. I took the DIMM out of the RAM slot and ran Memtest86 again. Hypothesis confirmed.

I consulted the handy Hardware Maintenance Manual. It turns out the internal RAM can be replaced too, but you have to remove the keyboard to do it. So, I did that and swapped the internal DIMM. This time Memtest86 still looked good after a couple of minutes, so I powered off, put the second stick of RAM back in, screwed everything back together, and now I have it running an exhaustive test.

Monday, I’ll get the dead laptop and bad RAM shipped to the service department.

The moral of the story: Always buy the extended warranty on a laptop. Even the best ones are significantly less reliable than desktop systems; they are more prone to overheating, and their tiny fans tend to get clogged easily or simply burn out. When something does go wrong, laptop parts are significantly more expensive than desktop parts. Repairs frequently involve motherboard or display module replacement, and can easily cost as much as the machine is worth.

Feb 24

The excitement started a week or two ago when I discovered that my ThinkPad laptop’s internal cooling fan had stopped working. As soon as I did something graphically intense for more than a minute or two, the system would overheat and perform an emergency shutdown.

Fortunately, I have a backup laptop.

Unfortunately, the backup ThinkPad laptop had also developed a fault. The fluorescent backlight for the display was failing. The screen was a curious reddish-purple color, and very dim—unless I turned the brightness up, in which case the backlight stopped working entirely, and everything went black. It also tended to crap out when the machine got warm.

So, I called the IBM hardware support line. The next day, the DHL truck showed up and I was handed a shipping box. I followed the instructions and shipped the dying-LCD laptop, which I figured was the less usable of the two. I enclosed the appropriate paperwork and a short description of the problem. Then, I went back to writing Java.

One day later, the DHL truck turned up again. It was my laptop, repaired. New LCD, and they also upgraded the BIOS while they were working on it.

Next, I installed Kubuntu on the repaired machine. I switched to Ubuntu back in June 2006, and had gotten used to GNOME. Unfortunately, the GNOME developers had subsequently decided it was a good idea to include the Mono runtime as a required part of gnome-desktop, and Ubuntu had made it a required part of ubuntu-desktop.

For those who don’t know, Mono is controversial. It’s a Novell project, and Novell just signed a deal with Microsoft to get permission to use patented Microsoft technology in Novell’s Linux distribution (SuSE). Mono is a reimplementation of Microsoft’s .NET, and it’s widely believed that Microsoft hold many patents that cover .NET.

One theory is that Microsoft is encouraging people to become dependent on Mono now so that they can suddenly threaten patent infringement suits and cripple desktop Linux later on. That might sound a little paranoid, but remember that Microsoft already funded SCO during their lawsuit alleging intellectual property infringement in Linux, so plenty of people are suspicious.

Anyway, I want no part of anything to do with .NET, so I had been planning to switch back to KDE and use Kubuntu once the next major release came along. But, with a newly repaired machine and the prospect of upheaval anyway, I decided I might as well make the switch now.

The next piece of excitement was when I discovered that Kubuntu doesn’t support ReiserFS. Regardless of whether Hans Reiser turns out to be guilty of murdering his wife, ReiserFS is on the way out, as Reiser’s team had stopped improving it in favor of Reiser4; and unfortunately, Reiser4 hasn’t made it into the Linux kernel.

So, I had to reformat the entire drive. After some research I decided to go with JFS. (Hey, it’s IBM dog food.) I soon had Kubuntu up and running.

Next I had to move my data over. I tried the direct approach, connecting the two laptops via ethernet and transferring my files over that. After a few minutes the first laptop overheated and shut down. Uh-oh.

I had a fairly recent full backup, so I restored that on the Kubuntu system. I then left rsync running overnight, at nice 19, with a bandwidth limit imposed. This got everything up to date slowly enough to avoid overheating.

Installing Java, Eclipse, VMware and IBM’s VPN software was next. Unpleasant, but it was done soon enough. I logged in and swapped the laptops, putting the newly repaired one on the desk and plugging in the external keyboard and trackball via USB.

Which is when things got really ugly.

The symptoms were unsubtle: the arrow keys, Insert, Delete, Home, End, Page Up and Page Down would all open Ksnapshot every time I pushed one of them. Investigating further with xev revealed that those keys were generating a spurious “release key with keycode 111″ event after each pair of correct events. No “push key with keycode 111″ event was being generated, but that didn’t seem to matter.

I investigated various possible fixes involving xmodmap. I tried unloading the USB HID kernel modules and seeing if X could handle the USB keyboard as an explicit second keyboard. Nothing worked.

Then, as I was staring at the output of lsmod, I had a vague recollection about UHCI and OHCI and EHCI and USB devices and incompatibilities… On a whim, I tried unplugging the keyboard from the USB hub, and plugging it directly into the laptop. Suddenly everything worked.

So it seems there’s some lingering bug in Linux’s USB keyboard support, which is triggered by USB keyboard converters. My guess is that when the keyboard is plugged into the hub, the incoming USB signals are converted to USB 2.0 by the hub, whereas when the keyboard is plugged directly into the laptop everything is done using USB 1.x. Perhaps the buggy module is only used for USB 2.0.

Actually, there’s one last lingering problem… if I type Shift-Insert the system goes insane, launching dozens of Ksnapshot windows. So I think I need to get a genuine USB keyboard. In the mean time, I’m making a mental note not to type shift-insert, which I don’t usually do anyway as most programs recognize the more usual Ctrl-P for paste.

Jun 11

I’ve been happily running Debian on my ThinkPad for over a year, probably the longest time I’ve ever kept a single OS on the thing. Or rather, I had been until Saturday. Saturday is when I decided to update my X.org.

I’d had some problems with X.org before. Debian Testing upgraded to X.org 7.0, and it turned out the ATI FireGL T2 drivers in that were broken. So, no fancy new X.org 7 for me until 7.1, I thought, which was a shame because the new ATI drivers in 7.x provide full hardware acceleration, including 3D.

Still, updates were to be had, so I went ahead with what I expected to be a routine point release upgrade of 6.9. However, it turned out that the packaging of X.org has been rearranged, along with the system directories.

Result: no X.

I tried running the autoconfig, which has always worked in the past. It didn’t work, couldn’t find the perfectly ordinary USB mouse either. I upgraded everything else via apt-get upgrade and rebooted, and discovered a ton of errors now appeared during boot. I spent an hour or so dicking around before coming to the conclusion that the system was hosed in a way which would probably require some kind of reinstallation.

This isn’t my first moment of dissatisfaction with Debian. PAM was broken for months, I’m not sure if it has even been fixed yet. Sound stopped working a couple of months ago. It seems as if somehow along the way ‘testing’ has become ‘unstable’. Perhaps it’s because of the pressure to speed up the release cycle–but then, I don’t see any new stable releases on the horizon.

So, it was time to weigh options. Debian Testing had just burned me badly, so that was out. I could stick with Debian, reinstall Sarge, and live with no accelerated graphics until the next Debian release, which could be years away. I could try the IBM Linux image, which is based on a well-known commercial Linux distribution that I’m not a big fan of. Or, I could try something else.

The new distribution all the cool kids are running is Ubuntu, so I downloaded and burned a CD and booted it. All the ThinkPad hardware worked first time, including Bluetooth, ATI graphics with 3D acceleration, sound, and ACPI power control. So, it looked as though Ubuntu would give me the Debian base I liked, with the advantage of a release schedule measured in months rather than years, and accelerated graphics.

However, Ubuntu is based on GNOME, and I’ve been a KDE user in recent years. There’s a KDE-based Ubuntu variant (Kubuntu), and also one that runs the XFce windowing environment (Xubuntu). I tried all three.

GNOME is nice and simple in appearance, but it’s a terrible RAM hog. KDE has chronic optionitis, but has lots of handy programs; but I thought about the programs I run all the time, and realized that only one is actually built for KDE–the others are all GTK-based.

Then I tried XFce, which is GTK-based, and noted that I could run XFce and Firefox together and use less RAM than just the KDE desktop. So, XFce was ahead. When I noticed that XFce showed file sizes correctly but GNOME didn’t, the deal was sealed.

Next problem was to back up all my user data. I went on a cleaning out spree, burnt a DVD of old stuff I hope never to need again, and shrunk everything down to under 30GB. I used rsync to back it all up to our MP3 and e-mail server temporarily.

Then, I decided to be daring, and used resize_reiserfs and GNOME partition editor to make space for a new root partition, turning the old partition into /home. This allowed me to install Xubuntu without wiping my home directory.

I just finished confirming that I can get the VPN working, so I don’t have to go into the office in the morning. I’ll get Eclipse and all the other work stuff going again tomorrow.

Aug 14

I’ve switched my ThinkPad to Debian, along with my desktop machine. I did it to get off the upgrade treadmill. I was using a well-known Linux distribution, as customized for IBM use, but a second forced reinstall in under a year made me snap.

I don’t want to reinstall my OS. I don’t mind it so much with OS X, because Apple make it such a trivial task—you archive and install, and your user data and applications stay there, along with all the necessary configuration, and the new OS is installed cleanly. But a RedHat or SuSE reinstall is painful, even if you are smart enough to record what was installed and take copies of the various key configuration files. Frankly, if I didn’t mind reinstalling everything from scratch once a year, I’d run Windows.

So, it was time to move to a Linux distribution that wasn’t going to force me to do unnecessary work. While I like Gentoo, I’d tried and failed to get IBM’s stupid proprietary VPN solution to work with it. So, Debian was the obvious choice.

I’d tried and failed to install Debian before, dealing with that ugly jigdo thing to try and assemble a couple of CDs, only to have the installer bury me in “Hello, you’ve just installed Package Z, here’s a page of useless information…” dialog screens.

Fortunately, Debian now has an actual installer. You download the ISO and burn it on a dinky little 8cm CD, boot it, and it leads you through the process with very little fuss. In fact, for the desktop box the most difficult part was partitioning.

For the ThinkPad, things were a little more exciting. Laptops are almost always packed full of flaky proprietary hardware, and in this case the graphics card was the tough part. Still, I eventually tracked down a working XF86Config-4, polished it up a bit, and was done.

Getting Notes 6 to run was trickier. I ended up using a vintage WINE from November 2003, converted from RPM format via alien, because current releases of WINE are broken in an irritating manner—they cause every scrollable window in the Notes client to be drawn with a double set of scroll bars.

The VPN setup was ugly, involving kernel patches, but I had some instructions from an internal web site which I managed to decipher, and now I can patch and run future 2.6.x kernel releases. I needed to build a custom kernel anyway, as the ThinkPad needed some kernel modules to work fully. As you might guess from the mention of 2.6 kernels, I’m running a ‘testing’ Debian install with a few bits of ‘unstable’ as necessary (basically Mozilla).

So now it’s all up and running, I can ignore the daily Windows virus alerts, and I can keep up-to-date with security patches and OS improvements without spending any significant amount of time doing so, and I can get on with my actual job. Fancy that!

Jan 29

My router decided to crap out. It’s an SMC. It was over $200 when I bought it, back in the mists of time, but a few years later you can pick them up for $30. Mine suddenly decided that it would be a good idea to lock up (a) every time there was an incoming SSH connection, and (b) any time I attempted to log in to change its settings or reboot it.

So I stomped off to Staples and picked up a new router. This one’s a Netgear, it was the fastest and most reliable in PC Magazine’s tests, and it happened to be on sale locally.

In a glass-half-full kind of way, I must admit that router technology has improved a lot in the last couple of years. I plugged this one in, and was rather startled when it detected the cable modem, worked out the right settings for Comcast, and just worked. I disconnected the SMC, used the iBook to configure the Netgear to have the same SSID network name the SMC used to use, and all the other Macs kept working. No reboots.

Then came Linux. That seemed more reluctant to accept change. There were probably cache files for dhcpcd that I could have found, but it was easier to reboot and have everything just work again.

Then came Windows. It seemed to be confused by the sudden loss of base station, and wouldn’t renew a DHCP lease to get a new IP address. So, I tried rebooting it. Once I did that, it decided it didn’t have a network connection at all. I tried running the wireless card control panel, which told me I had the wrong driver version installed and that I should reinstall it.

So, I downloaded the Orinoco driver software on the Mac, wrote it to a USB memory stick, transferred it to the PC, and reinstalled. The installer seemed to finish, but when I rebooted there was no change.

Next I used Add/Remove Programs to remove the Orinoco software, rebooted, and installed it again. Still no change.

Finally I removed the wireless card, rebooted, removed the software, rebooted, and plugged the card in again. Windows helpfully started installing something it had found lying around somewhere. It got as far as installing Net Firewall, and complained that the code wasn’t signed by Microsoft. I told it to go ahead anyway, and it told me that a file was missing and the install had been cancelled.

Then it started the Net Firewall install again. And again. And again.

I rebooted again, pulling the wireless card as I did so. This time no spontaneous install. I plugged in the wireless card… and wonder of wonders, it worked this time.

So, I tried running the VPN software… and that’s broken. It just goes into an infinite loop of trying to set up the connection. Tech support can’t help. I’m gonna fiddle with it some more, but right now it’s not working, so I might be spending Monday doing a full Windows reinstall.

Jan 11

I have the IBM VPN client working across my home firewall and cable modem connection, without interfering with the Macs. Now I can work at full speed at home, wirelessly from the laptop. Come springtime, I’ll be able to sit on the back porch… I am 31337 h4×0r.