I use Java API in order to insert datas on influxdb 2 (WriteApiBlocking).
I insert same datas with same timestamps twice using API and i encountered the following problem :
Using command line du --max-depth=1 /var/lib/influxdb/engine/data, i notices that size of my bucket folder has doubled.
Do you have any explanation about this behaviour ?
Hello @kevin1,
Welcome!
Hmm that is confusing because for points that have the same measurement name, tag set, and timestamp, InfluxDB creates a union of the old and new field sets. For any matching field keys, InfluxDB uses the field value of the new point
To simplify conflict resolution and increase write performance, InfluxDB assumes data sent multiple times is duplicate data. Identical points aren’t stored twice. If a new field value is submitted for a point, InfluxDB updates the point with the most recent field value.
But perhaps there is some delay with sharding and compaction? Do you see that increase indefinitely? Thanks
When I logged into my VPS this morning, the database size was reduced to what I expected (The data has been compressed during night because I see following log (msg=“Compaction progress”). Actually, I test influxDB on a test server but in production, I will insert approximately 4 Go data per day (sometimes duplicated datas) and I’d like to compress these more regularly.
Since I’m starting to use influxDB, I think I’m lacking configuration to compact data more often.
Do you have any documentation or example on configuring my influxDB for data compaction ?
Additionally, for compaction to take place (msg=“Compaction progress”), do I have to stop inserting data? I can’t find the answer to this question in the Influx DB2 document