Backup only specific parts of the collected data

TomsCodingCode · November 3, 2022, 4:21pm

Hi there,
in my application I want to backup certain parts of the collected data. To do so i was thinking about using the |> to() function to copy the data to a temporary bucket and then making a backup of the temporary bucket.
However the to function returns all data that was written, which creates huge network load and makes it take a lot longer, even though i don’t need the data.

If i do this in the ui the query itself takes 7 seconds (as indicated by the UI), but i can’t use the ui for almost a minute.
If i do it in my application the resulting backup has the same size if kill the query process after 30 seconds or after 60.

I am using the influx cli.

Is there a way to not have the to operator return the data or to know when the copying process finished?
Is there a more elegant way to create backups of specific parts of the data?

Jay_Clifford · November 4, 2022, 1:00pm

Hi @TomsCodingCode,
Could you provide the flux code you are using with the to() function. How much data are you approximately moving to the new bucket? Also what are the specifications of the device you are running InfluxDB on?

TomsCodingCode · November 4, 2022, 5:43pm

Thank you for the quick reply!

My code is fairly simple:

from(bucket: “myBigBucket”
|> range(start: 2022-11-02T00:00:00Z, stop: 2022-11-02T17:00:00Z)
|> filter(fn: (r) => r._measurement =~ /some Regex/)
|> to(bucket: “mySmallBucket”)

“mySmallBucket” is empty at this point and the schema is set to implicit

The regex is expected to hit many many measurements, something in the area of 10k, out of a total of about 35k

I am expecting about 200k data points and the resulting backup is about 5 gb large
(The points are not evenly distributed between the measurements)
each measurement has 1-3 fields and one Tag

I was testing on a dell laptop with a 12th gen i5 and an ssd. The influxDB instance was running on localhost.

I was expecting this to take a while and I’m fine with that.
The strange thing to me was the fact, that the UI reports a very fast execution time (~10s) but the UI was frozen for about a minute, and the CLI didn’t complete within multiple minutes.

Topic		Replies	Views
Copy large buckets using to() function InfluxDB 2 flux	2	88	December 30, 2024
Transform data from one bucket to another influxdb , flux	3	3052	May 10, 2022
Influxdb v2 backup only last 24 hours on cli bashscipt InfluxDB 2 influxdb , cli	2	354	September 28, 2023
InfluxDB 2 Backup & Restore InfluxDB 2	0	702	March 5, 2021
Best Practice to outsource old data? InfluxDB 2 influxdata , backup	2	753	November 10, 2022

Backup only specific parts of the collected data

Related topics