Abnormal disk usage

For the past two Mondays, my database has been filling up disk space without any maintenance.

The database can usually be compressed/maintained to keep disk space usage low.

But for the past two Mondays, it has stopped doing so, as this graph shows.

Here are the specifications of my machine:

influxdb version: 2.7.4n, on a virtual machine
-The buckets:

  • Defaultbucket with 7-day retention
  • Bucket1 with infinite retention (About 700,000 data)
  • Bucket2 with 90-day retention (About 150,000 data)
  • BucketTest with 35-day retention (About 170,000 data)

Today, the database is around 8.6 GB.
The machine is at 13 GB used out of 15 total.

InfluxDB is fed by Node-Red, with data generally arriving every 10 minutes.

For the past few days/weeks, data has been using tags. This may be the only major change I’ve made to the data.

Why do automatic maintenance operations seem to have stopped?
Is this normal?
Does the space used by the database seem normal?

What can I do to reduce the database as it used to do automatically?

Thank you in advance for your help

EDIT:

in the logs, the only thing that catches my eye is this:

août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586624Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586637Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586643Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586654Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586665Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586672Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586682Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586689Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586701Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586708Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:12:53 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:12:53.586718Z lvl=info msg="Compacting file" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_group ts>
août 12 06:13:47 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:13:47.079500Z lvl=warn msg="Error compacting TSM files" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compa>
août 12 06:13:48 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:13:48.081866Z lvl=info msg="TSM compaction (end)" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_gro>
août 12 06:13:48 GIN influxd-systemd-start.sh[637]: ts=2024-08-12T04:13:48.600890Z lvl=info msg="TSM compaction (start)" log_id=0pzHmWbG000 service=storage-engine engine=tsm1 tsm1_strategy=full tsm1_optimize=false op_name=tsm1_compact_g>

Hello @Yohan
Have you introduced any new tags?
This could also help:

I wish you could use this:

But you might find some useful flux queries in there that could help you narrow in on whats happening, because no that’s not normal.

For example the cardinality() function.

Otherwise I might create an issue on gh. There could also be a corruption issue in any of the TSM files, which could cause the compaction process to fail. You may need to manually delete any corrupted TSM files, but this should be done cautiously. Before proceeding, ensure you have a backup of your data.Run influxd inspect to check the health of your TSM files and see if there are any specific issues that need to be addressed. You might also wanna check your InfluxDB config. Check settings related to compaction such as compact-full-write-cold-duration.
Configure InfluxDB OSS | InfluxDB OSS v1 Documentation.

Hello @Anaisdg , thank you for your reply.

before I start reading, I’d like to answer your question.

Yes, I’ve assigned tags to fields that didn’t previously have tags. In fact, in the tags of the field I find the two possible tags defined and a third, empty, corresponding to all the previous untagged values.

Yesterday, I cleaned things up manually using the API.
I started by cleaning up a few of the values I didn’t need, and after a restart, nothing new.

But then I remembered that one field in particular was a problem.
With this field, when I selected a large range (like “this year”) I got a response like “reference pointer” but I didn’t save this error message.
Whereas the behavior of a query when selecting a range older than the first value is not the same. No error message usually appears.

So I deleted the oldest values between the beginning of the year and June. After a restart, things seemed to be going a bit better.
The machine is at 50% disk usage and the database occupies 3.3GB.

This seemed to be the problem, but I don’t understand why the last compaction maintenance was carried out on July 29, when this “corrupted” data has existed for longer.

Last night, no compaction errors seem to have occurred.