Hi,
I am running influxdb 1.8.10 on my Raspberry Pi 3 and I am gathering various statistics (like CPU, RAM, disk storage capacity) every 30 seconds from 4 devices with telegraf.
It has worked fine for a lot of time, but lately I got a problem with TSM compaction.
Rather often (maybe once in a month) I see various errors in the logs and if I debug influxdb, I see this:
2023-01-23T15:22:47.627162Z info Error replacing new TSM files {“log_id”: “0fZoHS~0000”, “engine”: “tsm1”, “tsm1_strategy”: “full”, “tsm1_optimize”: false, “trace_id”: “0fZoQ6kG000”, “op_name”: “tsm1_compact_group”, “db_shard_id”: 297, “error”: “cannot allocate memory”}
If I manually remove shard 297, then everything will work OK until the next month.
If I don’t, Rpi3 will keep reading and sending data to the DB (located in a folder that has a link to my NAS, where the real DB sits) at a pace of around 20Mbit/s, thus wasting bandwidth and draining disk resources.
I have many shards but it looks like influx only uses 297 and continuously loops around it.
This folder is almost 1GB big and I have 1GB RAM.
Is there a way to split it into different smaller shards? Would this make any differences?
I moved from in_memory to tsm, rebuilt the DB and restarted influxdb but the errors keep appearing
I read that by adding tsm-use-seek = true in the [data] part would have helped, but it looks like nothing changed instead.
Is there any workaround? I don’t understand why this didn’t happen in the past - I haven’t changed anything.
I checked the tsm files and they are all healthy.
Thanks for helping