Sometimes, Gentoo bites

I had a failure of my Cacti system over the weekend, entirely caused by bad Gentoo emerges. Two different problems, both caused by bad upgrades of packages brought in from ~amd64 or ~x86, made Cacti colourfully dysfunctional for a couple of days.

The first was an update to the spine resource poller, part of the Cacti project but installed separately (it used to be called cactid). Turns out that somewhere between 0.8.7a and 0.8.7b, bugs were introduced that made spine unreliable on 64-bit systems. The update brought in a SVN version of spine which, while still labelled 0.8.7a, must have been somewhere after one or more of the bugs came in. The symptom was that every data value obtained via SNMP was garbage and ignored.

The second issue was strange — graphs were getting generated (even those for which there was no data) but there was no text on them! Titles, margins, legend, axes, all were blank. Some posts pointed to a problem accessing the TTF font file provided with rrdtool, but the actual problem turned out to be the upgrade to rrdtool 1.2.28 which introduced different parameters for the configuration of text attributes in graphs — and a corresponding “feature” that suppressed any text output if the new parameters were missing.

So what does “~” have to do with this? The software on your system is built according to the architecture of your machine. In Gentoo, this is called your “arch” (for architecture) and is usually “x86” or “amd64”. Gentoo implements a “testing branch” in an arch which starts with “~”; if a pre-release version of a package exists in portage you can bring it in with the “~x86” keyword. The nice thing about this is that you don’t have to enable a testing repository across your whole system — you can enable the ~ keyword for specific packages on your system, and everything else stays stable.

Unfortunately, this flexibility has a cost. The “amd64” arch seems to lag a bit behind “x86” in terms of packages being marked stable or just simply having packages available. This means that just to get things installed, it’s necessary to flag packages with “x86”, “~amd64” or even “~x86”. This flagging is easily done — almost too easy in fact, as it creates a problem later on when the package you actually set the keyword for eventually becomes stable and you don’t need the keyword set any more. It’s a manual process to revisit the keywords you’ve set and verify that they are still needed (and you know how well manual processes work).

Some time ago I started adding comments to the Portage config file where keywords are set, trying to explain why I set the flag: “to bring in version 1.2.34” for example. That way, if I ever do get around to manually auditing the package.keywords file, I’ll be able to check if some of the keywords are still needed. Still a manual review though.

So in the case of rrdtool and spine, I had set the “~” keyword some time in the past for some reason, possibly to get early access to a bug-fix ebuild. With no established method to revisit the keywords, I continued to pull in unstable versions of packages long after the packages I really needed had been marked stable. Eventually, it bit me.

The pre- and post-upgrade chacklist grows some more…  đꙂ

Leave a comment