A few months ago I installed InfluxDB (current version was 2.6.1) on Debian Bullseye from InfluxData Package Repository. I used it to store IOT measurements from Home Assistant. Everything’s been running smoothly since then.
A few days ago, out of the blue, InfluxDB completely stopped working and the Debian server nearly hanged. The server was monitored by Zabbix and even if for a great part of the time the zabbix-client wasn’t able to connect to the Zabbix main server, I was able to see that the hard disk utilization was constantly >= 100%
I could stop the InfluxDB service and check the logs. I found a lot of errors like these:
Jul 02 15:30:14 server influxd-systemd-start.sh[464495]: ts=2023-07-02T13:30:14.114304Z lvl=error msg="Unable to write gathered points" log_id=0imh_UQW000 service=scraper scraper-name="new target" error=timeout
Jul 02 15:30:24 server influxd-systemd-start.sh[464495]: ts=2023-07-02T13:30:24.634666Z lvl=error msg="Unable to write gathered points" log_id=0imh_UQW000 service=scraper scraper-name="new target" error=timeout
Jul 02 15:30:34 server influxd-systemd-start.sh[464495]: ts=2023-07-02T13:30:34.256583Z lvl=error msg="Unable to write gathered points" log_id=0imh_UQW000 service=scraper scraper-name="new target" error=timeout
The server is an old one, 2GB RAM and 250GB HD, but it runs only Home Assistant in Docker and InfluxDB. It’s been running for months without a glitch. At this moment I do not have phisical access, I can only SSH or remotely AC reboot in case of emergency.
The problem arises as soon as I launch InfluxDB (with systemctl start influxdb
). There is something using 100% of the hard disk but I cannot figure out what that is (besides seeing with iotop it’s the influxd process).
After reading the documentation I tried to reduce some storage by setting these flags in config.toml
:
storage-cache-max-memory-size = 268435456
storage-compact-throughput-burst = 8388608
storage-wal-fsync-delay = 10
storage-wal-max-concurrent-writes = 1
storage-wal-max-write-delay = 0
storage-write-timeout = 120
storage-max-concurrent-compactions = 1
storage-series-file-max-concurrent-snapshot-compactions = 1
but the server keeps hanging as soon as I start the influxd service.
Is there something I can try to do to solve this problem without losing the data already collected? As a last chance I think I can remove everything related to InfluxDB and restart from zero, but I’m afraid I’d lose the data already collected.
Thanks in advance for your help!