When diagnostics explode
A great many things can break in a software system. However, diagnostics breaking the rest of the software is especially ironic. These tools are supposed to help you find bugs and other problems after the fact, not become one.
The system in question was a small data-recorder running on a BeagleBone Black (BBB), continiously recording measurements from specialized hardware.
These measurements are stored in an SQLite database and can be retrieved (and purged) via a very simple http interface.
For context: the BeagleBone Black is a small GNU/Linux ARM device, not unlike a Raspberry Pi.
During development, we noticed that logfiles would quickly grow to hundreds of megabyte, which could potentially be a problem if the data in the SQLite database is not retrieved, and subsequently purged, for a while. So as a precaution, we set the file-size limit to 5mb in /etc/logrotate.conf
. We figured that should solve it, and during testing the logs never got very big again.
Fun in production
Imagine my surprise when I saw a 1.4gb /var/log
folder that prevented any successful writes and subsequently corrupted the SQLite db. SQLite does not deal well with full disks, so this was a huge problem.
Two files especially, daemon.log and syslog, were huge with ~950mb and ~450mb respectively. They were clearly bigger than 5mb. logrotate
was configured to run daily and weekly respectively. We were kind of spamming the log files, and estimated at max 50mb growth in either file per day, which should limit the files to 50mb and 350mb. But obviously, it didn’t.
Time travel
The production environment has several special properties:
- The BBB is not connected to the internet.
- There are semi-frequent power losses.
- The BBB does not have a battery, so power-cycling it means its internal date is reset.
What all this amounts to is: The system doesn’t know the current time and can’t get it via ntp. And whenever the system starts again, it resumes from a fixed date the disk was flashed with.
logrotate
on the other hand doesn’t like that one bit. It’ll get confused by the files written in the future and even worse, it remembers when it last ran. And it doesn’t run if that’s in the future. So if the BBB runs nicely from January, 1st to July, 1st and then power-cycles, you’ll have to wait half a year for your daily logrotate run. And whenever it successfully runs, the problem will get worse.
So, in general, it’s not a good idea to run a full GNU Linux without a working clock!