Influxd error: "Unable to write gathered points" ... error=timeout

A few months ago I installed InfluxDB (current version was 2.6.1) on Debian Bullseye from InfluxData Package Repository. I used it to store IOT measurements from Home Assistant. Everything’s been running smoothly since then.

A few days ago, out of the blue, InfluxDB completely stopped working and the Debian server nearly hanged. The server was monitored by Zabbix and even if for a great part of the time the zabbix-client wasn’t able to connect to the Zabbix main server, I was able to see that the hard disk utilization was constantly >= 100%

I could stop the InfluxDB service and check the logs. I found a lot of errors like these:

Jul 02 15:30:14 server influxd-systemd-start.sh[464495]: ts=2023-07-02T13:30:14.114304Z lvl=error msg="Unable to write gathered points" log_id=0imh_UQW000 service=scraper scraper-name="new target" error=timeout

Jul 02 15:30:24 server influxd-systemd-start.sh[464495]: ts=2023-07-02T13:30:24.634666Z lvl=error msg="Unable to write gathered points" log_id=0imh_UQW000 service=scraper scraper-name="new target" error=timeout

Jul 02 15:30:34 server influxd-systemd-start.sh[464495]: ts=2023-07-02T13:30:34.256583Z lvl=error msg="Unable to write gathered points" log_id=0imh_UQW000 service=scraper scraper-name="new target" error=timeout

The server is an old one, 2GB RAM and 250GB HD, but it runs only Home Assistant in Docker and InfluxDB. It’s been running for months without a glitch. At this moment I do not have phisical access, I can only SSH or remotely AC reboot in case of emergency.

The problem arises as soon as I launch InfluxDB (with systemctl start influxdb). There is something using 100% of the hard disk but I cannot figure out what that is (besides seeing with iotop it’s the influxd process).

After reading the documentation I tried to reduce some storage by setting these flags in config.toml :

storage-cache-max-memory-size = 268435456
storage-compact-throughput-burst = 8388608
storage-wal-fsync-delay = 10
storage-wal-max-concurrent-writes = 1
storage-wal-max-write-delay = 0
storage-write-timeout = 120
storage-max-concurrent-compactions = 1
storage-series-file-max-concurrent-snapshot-compactions = 1

but the server keeps hanging as soon as I start the influxd service.

Is there something I can try to do to solve this problem without losing the data already collected? As a last chance I think I can remove everything related to InfluxDB and restart from zero, but I’m afraid I’d lose the data already collected.

Thanks in advance for your help!

Any hint or help on this? I still cannot start the influxd service without hanging the server :frowning:
Thanks

did you find a solution in the end?

I think the problem is more due to RAM usage rather than disk usage. Because I’m using influxdb in a docker container and right now I have 1.9GB of data but this container uses 13GB of RAM and as the data increases so does the RAM usage