Stale storage_compactions_active in metrics?

Hello all, I currently use v2.7.5 on Linux and I see something strange with the metrics I get via http://localhost:8086/metrics

The active compactions are reported by storage_compactions_active which is described as:

HELP storage_compactions_active Gauge of compactions (by level) currently running

The problem I have is that compactions are almost never removed by the metrics, even if the logs say that the compactions have been completed these are still included in the metrics. Is this to be expected?

Many thanks!

Example:

In the metrics:

storage_compactions_active{bucket="c53ca17bba029486",engine="tsm1",id="633",level="1",path="/data/influxdb/engine/data/c53ca17bba029486/autogen/633",walPath="/data/influxdb/engine/wal/c53ca17bba029486/autogen/633"} 0

In the logs:

Mar 21 14:39:26 TSM compaction (start)" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group op_event=start
Mar 21 14:39:26 Beginning compaction" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_files_n=8
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000009-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=1 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000010-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=2 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000011-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=3 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000012-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=4 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000013-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=5 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000014-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=6 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000015-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=7 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000016-000000001.tsm
Mar 21 14:39:36 Compacted file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000016-000000002.tsm.tmp
Mar 21 14:39:36 Finished compacting files" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_files_n=1
Mar 21 14:39:36 TSM compaction (end)" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group op_event=end op_elapsed=10695.737ms

It is low 15:04 local time.

Does the “0” at the end of the metric play a role? I see it switching to “1” from time to time…

Hello @Roberto_Divia,
I believe a value of “0” implies there are no active compactions for that specific label set, while a “1” would indicate an ongoing compaction.

I’m seeing that this change could be do to:

  • Metric scraping timing: If the compaction completes between two scrapes
  • Persistent “0” State: If the metric consistently shows “0” even when you expect it to be “1” based on log entries, it’s possible that the metric update is lagging or not accurately reflecting the real-time state due to processing delays or scrape interval misalignment.
  • Logs and metrics synchronization issues: It’s possible for logs to indicate compaction has completed just after a metric scrape, causing a perceived delay in the metric update.

I’m not sure though. I’m definitely not an expert on this. Is it causing any performance issues?

Hello @Anaisdg,
No, no performance issues. Just personal curiosity trying to understand why some compactions remain in the metrics (in state “0”) basically forever while other compactions (the majority of them) are removed from the metrics.