Huge memory usage despite low traffic

Hello,
I’ve read number of posts about high memory usage, and tried some tweaks, but still I am stunned what my tiny instance can do :slight_smile:

I run small InfluxDB 1.7.x at home setup, collecting data from sensors. My traffic is at most 2 requests per minute. It was running fine for couple of months and I did not monitor the instance carefully. The hardware is SOHO hp server with 8GB RAM and 1,4GHz AMD CPU. It is running Influx in container along with couple of other services.

Couple of days ago I noticed my services are not responding. The server was almost grind to halt, and Influx ate all memory available. I tried to put some limits (2GB RAM), but situation re-appeared. I managed to get docker stats:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
568b18738b8e        influxdb            8.44%               1.758GiB / 2GiB       87.88%              6.6MB / 7.06MB      125GB / 0B          13

Machine was heavily overloaded:

# uptime
02:31:49 up 5 days,  3:12,  1 user,  load average: 918.23, 918.27, 918.12

Influx process was in the D state so it couldn’t be killed. I resorted to restarting whole machine.

And I wonder how come it used so much memory and so many gigabytes of I/O, while my data set is orders of magnitudes smaller.

I tried to upgrade to 1.8.x, looks better:

CONTAINER ID        NAME                CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
568b18738b8e        influxdb            0.31%               176.4MiB / 2GiB       8.61%               45kB / 28.9kB       61.9MB / 0B         10

but I think the I had to do reboot once again.

Is there any _internal metric worth looking? HeapInUse for example?

Hello @Tomex,
Can you upgrade to OSS 2.1 and the official docker image? I believe your memory issues with be resolved. If not can you let me know?

I plan to stick with 1.x because of new query language introduced in 2.x, which is not fully supported by Grafana.

But, to the point, the culprit was… failing hard drive in my server. In the logs I saw some low level I/O problems which led to stuck I/O operations which in turn led to Influx waiting forever to write to disk.

I am not a sys-admin by any means, so I am especially proud of myself that I debugged the problem.

I also tweaked some settings of memory usage, but I think the disk could be the main reason.