Move the data from one bucket to another bucket

I am using the InfluxDB docker container 2.0.9.
I am getting the data from the Kafka broker every 5 mins which we are reading from the telegraf and writing into the InfluxDB.

In InfluxDB, we usually perform all the operations on the last 30 days’ data, but now our whole data is going into the same bucket.
We currently have more than 1 year of data in the same bucket, querying the data from the same puts an extra load on the CPU and takes more time to query.

We are thinking to move the data into different buckets on monthly basis.
Like we can create different buckets based on the month and year like
bucket_name_07_2022
bucket_name_08_2022
bucket_name_09_2022
bucket_name_10_2022
and these buckets will have data according to the month.
The main bucket bucket_name will have the current data.
I want to run write the data into the different buckets using InfluxDB CLI which I am able to do.

In the main bucket, I don’t want to keep data for more than one month so after copying the data from the main bucket to other buckets I want to delete the data from the main bucket.
I know we can delete the data using Delete data | InfluxDB OSS 2.5 Documentation
but is there any flux function that can move the data across different buckets not copy?

Thanks

@Anaisdg could you please help?

Hi @Ravikant_Gautam,
There sadly isn’t this is usually the job of the bucket retention policy. You could call the API during your flux query but I believe writing a script to run against the CLI would do a better job.

@Jay_Clifford @Anaisdg
I want to migrate the as it is to a different bucket based on the time. I tried moving the data from the primary bucket to a different bucket using two approaches

  • Using InfluxDB CLI- it’s taking too much time to migrate the data.

  • using InfluxDB UI it’s fast but it’s a manual process I need to pass one day range one by one and if I give a large time duration then CPU utilization becomes too high.

Is there any way where I can move the data without much human intervention like using Python-client but I am not able to understand it?

I don’t want to modify the data at all just simple migration from one bucket to another bucket.

In write_api it shows to write the data as a data point but I want to write the data which is already present in my bucket.

Can you please help with it?

Thanks

1 Like

Hello, @Ravikant_Gautam

I’m with a similar problem, where I want to move an amount of data from a bucket to the other in an efficient way.

You could alternatively use the influxdb_client, using the query_api() to pull the data and write_api to write in your new bucket.

This has similar problems as using the UI, depending on how fast, and how big your data ingest is, the RAM and CPU usage become too high, and in my case eventually crash the database.

Alternatively, I tried creating a new bucket, and manually copying the data from the original bucket in the engine/data and engine/wal folders. In this case, it didn’t work, as the DB didn’t recognize the new bucket has that data. Maybe someone from dev team can comment on that? @Anaisdg

Anyway, just sharing my experience. Hope gives you some light into it.

Regards

Hi @giuliano.lm @Ravikant_Gautam,
Have you considered running a downsampling task to move to data to a new bucket at a specific interval?

Have a look at this example:

import "influxdata/influxdb/tasks"
import "types"

// omit this line if adding task via the UI
option task = {name: "Downsample raw data", every: 10m}

data = () => from(bucket: "example-bucket")
    |> range(start: tasks.lastSuccess(orTime: -task.every))

numeric = data()
    |> filter(fn: (r) => types.isType(v: r._value, type: "float") or types.isType(v: r._value, type: "int") or types.isType(v: r._value, type: "uint"))
    |> aggregateWindow(every: task.every, fn: mean)

nonNumeric = data()
    |> filter(fn: (r) => types.isType(v: r._value, type: "string") or types.isType(v: r._value, type: "bool"))
    |> aggregateWindow(every: task.every, fn: last)

union(tables: [numeric, nonNumeric])
    |> to(bucket: "example-downsampled-bucket")

Hello, @Jay_Clifford

I ended up using the “copy” Flux query and splitting the data in monthly chunks. It still running, but did not compromise my RAM usage.

You’re suggestion/ approach seems a more efficient idea.

I will try it out! Many thanks for the response.

Thanks, @Jay_Clifford I will work and confirm you the same.