When Upgrades Go Wrong

I’m running Debian on a Linksys NSLU2 storage device, and it works really well in general. So well in fact that a lot of the time I forget the thing is even there! It’s sitting in the garage minding its own business, serving out video and music files, and storing backups of the other systems in the house. Just occasionally, however, the thought pops into my head to run a system update over it — a habit I’ve gotten into for the Gentoo systems in the house, but “the Slug” usually misses out. About a fortnight ago however I decided to do the “apt-get shuffle”. Timing, as they say in sport and comedy, is everything.

I’ve become fairly complacent about system updates. All the distros I use now have got excellent tools for keeping everything up-to-date, and for making sure that things don’t go wrong in the process. It’s all just software, however, and it’s all too easy for something to get missed or for a bug to creep in. One such bug that did exactly that is this one. Unreported at the time I did my update, it rendered my Slug unbootable after the update I gave it.

It took me a day to realise that the Slug was off the network. The failure of the nightly backups was my first clue. Next was the inability to stream any of the media files stored on it. For the next week, on-and-off, I tried a dozen things in an attempt to get it working again. I finally arrived at a process that used the Debian Installer firmware image as a way to get a running system onto the device, allowing me to then access the hard disk and try and reflash earlier kernel and initrd images to it.

I started trying to work on the boot disk, but I couldn’t see it for some reason. Then I discovered that the power supply of the USB2 disk enclosure that holds it was playing up! Now, I had two problems–was one related to the other? Was my boot problem just a hard disk problem all along? Turns out that the power supply failure was a coincidence–replacing the power supply got the disk working again but made no improvement in the bootup scenario.

The NSLU2 boots differently to a PC. On a PC, the BIOS locates some boot code on a storage device and executes that, which usually is a program like LILO or GRUB that has more intelligence and (in the case of GRUB) a way to interact with it. These boot loader programs then load in the kernel and start executing it. With the NSLU2, however, the kernel and the “initial root device” are written into the flash memory of the device–they more-or-less are the BIOS.

On a PC, if there’s a problem with the kernel or initrd you can generally select another one from a list. Worst-case would have you installing the hard-disk in a different PC and fixing the problem from there. On a NSLU2, however, any problem with the kernel or initrd can’t be fixed by changing the hard disk because the kernel and initrd aren’t read from the hard disk but from the flash memory instead. There’s also no option for selecting another kernel, since the NSLU2 is a “headless” device with no console (besides, there’d be no room in the flash memory for two copies of kernel and initrd).

Once I’d been able to get my Slug booting (by writing out a previous version of a kernel and initrd) I was going to leave it alone… but curiosity got the better of me. I’d suspected a bad update to the utility that generates the initrd, and sure enough an “apt-get update && apt-get upgrade” revealed a pending update to the initramfs-tools package. Google led me then to the above bug report. With fingers crossed I did the update, reflashed, and rebooted… successfully!

The Slug is now back in its usual place, quietly going about its business of entertaining us and keeping critical data safe. I might at least think twice before doing a kernel update on the poor beast in future though!

Leave a comment