Hi there,
in my application I want to backup certain parts of the collected data. To do so i was thinking about using the |> to() function to copy the data to a temporary bucket and then making a backup of the temporary bucket.
However the to function returns all data that was written, which creates huge network load and makes it take a lot longer, even though i don’t need the data.
If i do this in the ui the query itself takes 7 seconds (as indicated by the UI), but i can’t use the ui for almost a minute.
If i do it in my application the resulting backup has the same size if kill the query process after 30 seconds or after 60.
I am using the influx cli.
Is there a way to not have the to operator return the data or to know when the copying process finished?
Is there a more elegant way to create backups of specific parts of the data?
Hi @TomsCodingCode,
Could you provide the flux code you are using with the to() function. How much data are you approximately moving to the new bucket? Also what are the specifications of the device you are running InfluxDB on?
Thank you for the quick reply!
My code is fairly simple:
from(bucket: “myBigBucket”
|> range(start: 2022-11-02T00:00:00Z, stop: 2022-11-02T17:00:00Z)
|> filter(fn: (r) => r._measurement =~ /some Regex/)
|> to(bucket: “mySmallBucket”)
“mySmallBucket” is empty at this point and the schema is set to implicit
The regex is expected to hit many many measurements, something in the area of 10k, out of a total of about 35k
I am expecting about 200k data points and the resulting backup is about 5 gb large
(The points are not evenly distributed between the measurements)
each measurement has 1-3 fields and one Tag
I was testing on a dell laptop with a 12th gen i5 and an ssd. The influxDB instance was running on localhost.
I was expecting this to take a while and I’m fine with that.
The strange thing to me was the fact, that the UI reports a very fast execution time (~10s) but the UI was frozen for about a minute, and the CLI didn’t complete within multiple minutes.