I have a small home monitoring setup using InfluxDB, Grafana, Telegraf and various little scripts etc. Despite running on a humble Raspberry Pi (3+), it works (or worked) really well…
Recently, I foolishly increased the retention period from 15 days to 365 days. Initially, everything was fine, but now (a few months later), InfluxDB won’t start properly.
Running influxd manually in order to see more info, it spends ~5-10 minutes attempting to start, then crashes with a fatal out-of-memory error.
I’d like some advice on how I can get up and running again please. I can’t amend the retention period back again and/or delete data using queries, of course (as influx won’t start)… so I suspect I’ll need to delete the DB at filesystem level and just start again from scratch?
I notice that the /var/lib/influxdb/data and /var/lib/influxdb/wal folders contain my databases.
Is it safe to delete the contents of these folders in order to start again from scratch? I could then just create the databases and start logging data again, I assume…
One thing I’ve noticed for embedded systems is that the shard duration seems to matter - for large shard duration RPi’s, etc. seem to have issue on compaction and shut down. Then whenever you restart it dies trying to read the wal files and continue the compaction.
I think if you change the retention policy within chronograf it automatically sets a corresponding shard duration, where the shard duration grows with retention time.
I would recommend setting retention policy manually with shard duration from within influx; eg; retention = 1yr, shard duration = 1d. Then it does compaction more frequently and the Pi doesn’t run out of memory.