Stale storage_compactions_active in metrics?

Roberto_Divia · March 21, 2024, 1:51pm

Hello all, I currently use v2.7.5 on Linux and I see something strange with the metrics I get via http://localhost:8086/metrics

The active compactions are reported by storage_compactions_active which is described as:

HELP storage_compactions_active Gauge of compactions (by level) currently running

The problem I have is that compactions are almost never removed by the metrics, even if the logs say that the compactions have been completed these are still included in the metrics. Is this to be expected?

Many thanks!

Roberto_Divia · March 21, 2024, 2:04pm

Example:

In the metrics:

storage_compactions_active{bucket="c53ca17bba029486",engine="tsm1",id="633",level="1",path="/data/influxdb/engine/data/c53ca17bba029486/autogen/633",walPath="/data/influxdb/engine/wal/c53ca17bba029486/autogen/633"} 0

In the logs:

Mar 21 14:39:26 TSM compaction (start)" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group op_event=start
Mar 21 14:39:26 Beginning compaction" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_files_n=8
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000009-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=1 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000010-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=2 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000011-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=3 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000012-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=4 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000013-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=5 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000014-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=6 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000015-000000001.tsm
Mar 21 14:39:26 Compacting file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=7 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000016-000000001.tsm
Mar 21 14:39:36 Compacted file" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_index=0 tsm1_file=/data/influxdb/engine/data/c53ca17bba029486/autogen/633/000000016-000000002.tsm.tmp
Mar 21 14:39:36 Finished compacting files" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group tsm1_files_n=1
Mar 21 14:39:36 TSM compaction (end)" log_id=0o429iv0000 service=storage-engine engine=tsm1 tsm1_level=1 tsm1_strategy=level op_name=tsm1_compact_group op_event=end op_elapsed=10695.737ms

It is low 15:04 local time.

Roberto_Divia · March 21, 2024, 2:31pm

Does the “0” at the end of the metric play a role? I see it switching to “1” from time to time…

Anaisdg · April 1, 2024, 5:50pm

Hello @Roberto_Divia,
I believe a value of “0” implies there are no active compactions for that specific label set, while a “1” would indicate an ongoing compaction.

I’m seeing that this change could be do to:

Metric scraping timing: If the compaction completes between two scrapes
Persistent “0” State: If the metric consistently shows “0” even when you expect it to be “1” based on log entries, it’s possible that the metric update is lagging or not accurately reflecting the real-time state due to processing delays or scrape interval misalignment.
Logs and metrics synchronization issues: It’s possible for logs to indicate compaction has completed just after a metric scrape, causing a perceived delay in the metric update.

I’m not sure though. I’m definitely not an expert on this. Is it causing any performance issues?

Roberto_Divia · April 1, 2024, 7:07pm

Hello @Anaisdg,
No, no performance issues. Just personal curiosity trying to understand why some compactions remain in the metrics (in state “0”) basically forever while other compactions (the majority of them) are removed from the metrics.

Topic		Replies	Views
Deleting series and measurements InfluxDB 1	4	807	July 22, 2021
Question about "Compaction level" and "snapshot" influxdb	2	936	June 3, 2017
Full compaction proccess Store influxdb , influxdata	2	4348	March 22, 2017
InfluxDB TSM compactions cause temporary write timeouts Store influxdb	6	2117	October 15, 2019
Freeing Disk Space influxdb	17	17498	November 26, 2019

Stale storage_compactions_active in metrics?

Related topics