Gentoo games

We’re now well into Day Four of the Gentoo gcc/glibc update.  Part of the reason it’s taking so long is that it seems to have missed gtk early on, and anything that depends on gtk is failing its ebuild.  So a lot of the good overnight processing time has been squandered sitting at an “emerge failed” error.  Then, for some unknown reason, my server decided to give up the ghost and not reboot.

This is a weird problem  My syslog-ng consoles just stopped writing, although things like ircd were still running.  Full /var?  The first thing I though of, but it was only 80% used (unless that’s after a logrotate cleanup, and /var had actually filled during the night).

Things got especially weird when I tried to reboot.  My eth0 interface did not start properly, and eth1, which is not properly configured, was starting instead.  Then, various other services failed to start, which seemed to hang init — and with init hanging, my console gettys were not being started.  Thank heaven that somewhere along the line the config of KDM was changed to start it on the local console, allowing me to log on that way…

I had enabled bootchart, and I thought it was the problem.  There’s definitely a problem with bootchart, as all the error messages it spews out when it fills the tempfs it creates cannot be normal, but it wasn’t what was stopping my boot.

I had also enabled the parallel startup feature of Gentoo’s init, but that was not the issue either (although it did give me a somewhat clearer picture of the services that were not starting properly).

The eventual problem with the network startup was the introduction of coldplug support to the Gentoo boot process.  Both the Gigabit interfaces on the server are managed by the one driver, and udev must process events by LIFO (last-in-first-out) so it saw eth1 before eth0.  Then the kicker — after it thought it had brought up eth1, it stopped trying to bring up more interfaces because the config only required one interface other than lo to be up.

The fix, as documented out there on the Innernut, is to turn off coldplug and/or restrict it from handling the services representing the network interfaces.

So now all I need to do is undo all the undoing I did to try and sort the bootup problem (re-enable parallel startup, reinstall bootchart and fix the bug).  Oh, that’s right, the version of baselayout that introduced all these nice changes also undid the dependency fix I made that started OpenLDAP nice ane early in the boot process, so now it takes ages to boot again (I never get tired of seeing the LDAP server not start properly because the LDAP server is not available…).

Leave a comment