Hello,
I have telegraph db, which has daily shards, usually, data stored in 1 day is about 10-30GB, but from time to time, daily shard start consumes an enormous amount of disk space and growth up to 4TB for one day shard.
Any clues on how to understand what is the root cause?
Thanks!
the only thing that comes to mind is a failing compression…
Data in InfluxDB get compacted (by a process called compaction) several times, if this process fails for whatever reason you will have huge shards as the data will be uncompressed.
If that’s the case you should find something in the InfluxDB log
Thank you for reply, it may be, usually the issue occurs once per month for 2-5 days, then shards get back to normal again :-(. I was trying to find anything in logs but wasn’t able to find anything related.
Compactions (even successful ones) are logged.
Is there any track of compaction in your log? if not you may want to check your logging settings
Thank you, it is quite hard to find anything with default log level, I’ll decrease it to warning and check, thanks!
Unfortunately I was unable to find anything about compression in logs at all, I can see lots of weird lines like:
232 232
92 92
or errors with TLS handshake error.
but nothing else
that’s unexpected, in my log I have lots of compaction related lines like the ones below
ts=2023-01-31T05:08:36.677080Z lvl=info msg="TSM compaction (start)" log_id=0fiYzmF0000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0fiZTzmG000 op_name=tsm1_compact_group op_event=start
ts=2023-01-31T05:08:36.677080Z lvl=info msg="Beginning compaction" log_id=0fiYzmF0000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0fiZTzmG000 op_name=tsm1_compact_group tsm1_files_n=7
ts=2023-01-31T05:08:36.678080Z lvl=info msg="Compacting file" log_id=0fiYzmF0000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0fiZTzmG000 op_name=tsm1_compact_group tsm1_index=0 tsm1_file=C:\\QuantumMonitor\\influxdb\\configuration\\quantummonitor\\data\\contship\\standard\\535805\\000000008-000000002.tsm
...
ts=2023-01-31T05:08:38.081695Z lvl=info msg="Finished compacting files" log_id=0fiYzmF0000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0fiZTzmG000 op_name=tsm1_compact_group tsm1_files_n=1
ts=2023-01-31T05:08:38.081695Z lvl=info msg="TSM compaction (end)" log_id=0fiYzmF0000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0fiZTzmG000 op_name=tsm1_compact_group op_event=end op_elapsed=1404.546ms
my log level is set to info, but I’ve noticed/remembered that InfluxDB logs to stdout, which must be redirected to a file in order to actually have the log
If the problem is resource related, you may try to set max-concurrent-compactions
to a lower number (by default it’s half the machine cores).