I’ve been running InfluxDB on my small ARM machine for a couple of months now, and it has been seemingly working well except when the daemon has to restart.
8 core ARM, 2GB RAM, 256GB SSD, Ubuntu 16.04, InfluxDB 1.3.5 (and rest of tick stack). Telegraf and a custom application feeding db via udp. Chronograf showing multiple plots as far back as when database was last restarted.
A week or so in there was a problem after the machine had an abrupt restart. I had to recreate the db to get it now running again, and just chalked it up to a result of some of the initial fumbling around to get it running.
Yesterday I had to restart the machine again, and when it came back up there was only a couple of hours of data. I searched around for a solution and came across a variety of config settings to adjust (cache and memory sizes, number of concurrent tasks, etc) which I did… but now the daemon won’t start at all, failing with an error about tsm being unable to allocate memory.
Searching around about that brought up a few concerning posts about 32b address space limiting the db size to 2-3GB. That is concerning because the goal is to (nearly) fill the ssd. Seems like a strange limitation for a database, if true.
So can anyone offer any input on:
- how to avoid loss of data beyond a few hours old when restarting
- how to fix the tsm out of memory
- whether influx is actually useful in my use case… I’d hate to have to abandon what seems like quite a nice tool stack
Update: I updated to 1.3.7 (leaving my influxdb.conf file untouched) and now the service starts and some of the data (about a week) is showing up in Chronograf. I see a few instances of this error in the journal:
error compacting TSM files: cannot allocate memory engine=tsm1
There is also a HUGE volume of messages in the journal – and journalctl reports suppressing ~30,000 others. Is there a way to turn down the volume of messages generated? I don’t want to fill my boot volume with logging. The messages look like this:
Oct 30 13:08:46 hc1 influxd: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:395 +0x2a4
Oct 30 13:08:46 hc1 influxd: created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Open
Oct 30 13:08:46 hc1 influxd: #011/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/file_store.go:399 +0x3f4
Oct 30 13:08:46 hc1 influxd: goroutine 61010 [chan send, 1 minutes]:
Oct 30 13:08:46 hc1 influxd: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*FileStore).Open.func1(0x10c51810, 0x1d56fa80, 0x26b6, 0x…
Repeated over and over and over. Thousands per second.
Another update: my system has crashed (as in shutdown unexpectedly) twice since I got influxdb up and running again, and both times it was while opening a Chronograf dashboard (and therefore doing queries against the InfluxDB). Linux doesn’t usually go down without a fight, so this is somewhat surprising.
And now Chronograf erases most of its graphs to black, so I can only see 1-2 out of 5 at a time.