Hi,
I’m running an InfluxDB (2.6) container, limiting its available memory.
I’ve encountered a few times, where one of the users runs some kind of query (I actually don’t know the precise condition) and the influx goes into some loop where it takes all the available memory and reads from disc. (Memory is fully used and BLOCK I/O has a very very high number).
when I go to the influx container logs, I see the following pattern:
{"log":"ts=2023-06-21T08:08:33.912120Z lvl=info msg=\"loading changes (start)\" log_id=0iZFof5l000 service=storage-engine engine=tsm1 op_name=\"field indices\" op_event=start\n","stream":"stdout","time":"2023-06-21T08:08:33.912170363Z"}
{"log":"ts=2023-06-21T08:08:33.912150Z lvl=info msg=\"loading changes (end)\" log_id=0iZFof5l000 service=storage-engine engine=tsm1 op_name=\"field indices\" op_event=end op_elapsed=0.033ms\n","stream":"stdout","time":"2023-06-21T08:08:33.912221505Z"}
{"log":"ts=2023-06-21T08:08:33.913646Z lvl=info msg=\"Opened file\" log_id=0iZFof5l000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/844ea0a92a6d5f0e/autogen/256/000000021-000000002.tsm id=0 duration=5.007ms\n","stream":"stdout","time":"2023-06-21T08:08:33.913729618Z"}
{"log":"ts=2023-06-21T08:08:33.913812Z lvl=info msg=\"Opened shard\" log_id=0iZFof5l000 service=storage-engine service=store op_name=tsdb_open index_version=tsi1 path=/var/lib/influxdb2/engine/data/844ea0a92a6d5f0e/autogen/256 duration=26.220ms\n","stream":"stdout","time":"2023-06-21T08:08:33.913877392Z"}
{"log":"ts=2023-06-21T08:08:33.916032Z lvl=info msg=\"Opened file\" log_id=0iZFof5l000 service=storage-engine engine=tsm1 service=filestore path=/var/lib/influxdb2/engine/data/8963475e496bb947/autogen/259/000000012-000000002.tsm id=0 duration=3.714ms\n","stream":"stdout","time":"2023-06-21T08:08:33.916094505Z"}
{"log":"ts=2023-06-21T08:08:33.917749Z lvl=info msg=\"Opened shard\" log_id=0iZFof5l000 service=storage-engine service=store op_name=tsdb_open index_version=tsi1 path=/var/lib/influxdb2/engine/data/8963475e496bb947/autogen/259 duration=16.911ms\n","stream":"stdout","time":"2023-06-21T08:08:33.917818234Z"}
{"log":"ts=2023-06-21T08:08:33.920227Z lvl=info msg=\"index opened with 8 partitions\" log_id=0iZFof5l000 service=storage-engine index=tsi\n","stream":"stdout","time":"2023-06-21T08:08:33.920276544Z"}
This continues on even after restarting the container/machine.
The solution to getting out of this loop is to stop the container, increase/remove the memory limit and let it run for a bit. It then sattles back down to the lower memory usage it usually has.
What is happening?
How can this be avoided?
How can I know what query caused it?
Thanks in advance,
David