Diskspace increase when inserting same data

kevin1 · February 26, 2025, 3:57pm

Hello,

I use Java API in order to insert datas on influxdb 2 (WriteApiBlocking).
I insert same datas with same timestamps twice using API and i encountered the following problem :

Using command line du --max-depth=1 /var/lib/influxdb/engine/data, i notices that size of my bucket folder has doubled.

Do you have any explanation about this behaviour ?

Thanks
Kevin

Anaisdg · February 26, 2025, 7:20pm

Hello @kevin1,
Welcome!
Hmm that is confusing because for points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point

This approach is part of InfluxDB’s design principles, as mentioned in the InfluxDB design principles documentation:

To simplify conflict resolution and increase write performance, InfluxDB assumes data sent multiple times is duplicate data. Identical points aren’t stored twice. If a new field value is submitted for a point, InfluxDB updates the point with the most recent field value.

But perhaps there is some delay with sharding and compaction? Do you see that increase indefinitely? Thanks

kevin1 · February 27, 2025, 7:37am

Hello @Anaisdg,

When I logged into my VPS this morning, the database size was reduced to what I expected (The data has been compressed during night because I see following log (msg=“Compaction progress”). Actually, I test influxDB on a test server but in production, I will insert approximately 4 Go data per day (sometimes duplicated datas) and I’d like to compress these more regularly.

Since I’m starting to use influxDB, I think I’m lacking configuration to compact data more often.
Do you have any documentation or example on configuring my influxDB for data compaction ?

Additionally, for compaction to take place (msg=“Compaction progress”), do I have to stop inserting data? I can’t find the answer to this question in the Influx DB2 document

Thank you in advance.
Have a nice day.

Topic		Replies	Views
How does InfluxDB handle duplicate points?	9	21512	May 30, 2017
Strange insertion behavior - Duplicate Point Store	1	9	December 14, 2024
Duplicate values stored in database Store influxdb	6	594	May 27, 2019
Why is data duplicated when I write the same points and timestamps after changing the field value? Store influxdb	1	1132	September 13, 2018
Merge duplicate data points Telegraf influxdb , time-series	3	2127	February 14, 2018

Diskspace increase when inserting same data

Related topics