Move the data from one bucket to another bucket

Ravikant_Gautam · November 21, 2022, 4:43pm

I am using the InfluxDB docker container 2.0.9.
I am getting the data from the Kafka broker every 5 mins which we are reading from the telegraf and writing into the InfluxDB.

In InfluxDB, we usually perform all the operations on the last 30 days’ data, but now our whole data is going into the same bucket.
We currently have more than 1 year of data in the same bucket, querying the data from the same puts an extra load on the CPU and takes more time to query.

We are thinking to move the data into different buckets on monthly basis.
Like we can create different buckets based on the month and year like
bucket_name_07_2022
bucket_name_08_2022
bucket_name_09_2022
bucket_name_10_2022
and these buckets will have data according to the month.
The main bucket bucket_name will have the current data.
I want to run write the data into the different buckets using InfluxDB CLI which I am able to do.

In the main bucket, I don’t want to keep data for more than one month so after copying the data from the main bucket to other buckets I want to delete the data from the main bucket.
I know we can delete the data using Delete data | InfluxDB OSS 2.5 Documentation
but is there any flux function that can move the data across different buckets not copy?

Thanks

Ravikant_Gautam · November 23, 2022, 9:58am

@Anaisdg could you please help?

Jay_Clifford · November 23, 2022, 2:26pm

Hi @Ravikant_Gautam,
There sadly isn’t this is usually the job of the bucket retention policy. You could call the API during your flux query but I believe writing a script to run against the CLI would do a better job.

Ravikant_Gautam · December 12, 2022, 2:17pm

@Jay_Clifford @Anaisdg
I want to migrate the as it is to a different bucket based on the time. I tried moving the data from the primary bucket to a different bucket using two approaches

Using InfluxDB CLI- it’s taking too much time to migrate the data.
using InfluxDB UI it’s fast but it’s a manual process I need to pass one day range one by one and if I give a large time duration then CPU utilization becomes too high.

Is there any way where I can move the data without much human intervention like using Python-client but I am not able to understand it?

I don’t want to modify the data at all just simple migration from one bucket to another bucket.

github.com

influxdata/influxdb-client-python/blob/master/influxdb_client/client/write_api_async.py

"""Collect and async write time series data to InfluxDB Cloud or InfluxDB OSS."""
import logging
from collections import defaultdict
from typing import Union, Iterable, NamedTuple

from influxdb_client import Point, WritePrecision
from influxdb_client.client._base import _BaseWriteApi, _HAS_DATACLASS
from influxdb_client.client.util.helpers import get_org_query_param
from influxdb_client.client.write.point import DEFAULT_WRITE_PRECISION
from influxdb_client.client.write_api import PointSettings

logger = logging.getLogger('influxdb_client.client.write_api_async')

if _HAS_DATACLASS:
    from dataclasses import dataclass


class WriteApiAsync(_BaseWriteApi):
    """
    Implementation for '/api/v2/write' endpoint.

This file has been truncated. show original

In write_api it shows to write the data as a data point but I want to write the data which is already present in my bucket.

Can you please help with it?

Thanks

giuliano.lm · December 16, 2022, 2:55pm

Hello, @Ravikant_Gautam

I’m with a similar problem, where I want to move an amount of data from a bucket to the other in an efficient way.

You could alternatively use the influxdb_client, using the query_api() to pull the data and write_api to write in your new bucket.

This has similar problems as using the UI, depending on how fast, and how big your data ingest is, the RAM and CPU usage become too high, and in my case eventually crash the database.

Alternatively, I tried creating a new bucket, and manually copying the data from the original bucket in the engine/data and engine/wal folders. In this case, it didn’t work, as the DB didn’t recognize the new bucket has that data. Maybe someone from dev team can comment on that? @Anaisdg

Anyway, just sharing my experience. Hope gives you some light into it.

Regards

Jay_Clifford · December 19, 2022, 1:15pm

Hi @giuliano.lm @Ravikant_Gautam,
Have you considered running a downsampling task to move to data to a new bucket at a specific interval?

Have a look at this example:

import "influxdata/influxdb/tasks"
import "types"

// omit this line if adding task via the UI
option task = {name: "Downsample raw data", every: 10m}

data = () => from(bucket: "example-bucket")
    |> range(start: tasks.lastSuccess(orTime: -task.every))

numeric = data()
    |> filter(fn: (r) => types.isType(v: r._value, type: "float") or types.isType(v: r._value, type: "int") or types.isType(v: r._value, type: "uint"))
    |> aggregateWindow(every: task.every, fn: mean)

nonNumeric = data()
    |> filter(fn: (r) => types.isType(v: r._value, type: "string") or types.isType(v: r._value, type: "bool"))
    |> aggregateWindow(every: task.every, fn: last)

union(tables: [numeric, nonNumeric])
    |> to(bucket: "example-downsampled-bucket")

giuliano.lm · December 19, 2022, 1:56pm

Hello, @Jay_Clifford

I ended up using the “copy” Flux query and splitting the data in monthly chunks. It still running, but did not compromise my RAM usage.

You’re suggestion/ approach seems a more efficient idea.

I will try it out! Many thanks for the response.

Ravikant_Gautam · December 20, 2022, 5:36am

Thanks, @Jay_Clifford I will work and confirm you the same.

Topic		Replies	Views
What is the proper way to move a measurement to a different bucket? Fluxlang flux	2	1498	February 25, 2022
Delete old data from measurements InfluxDB 2 influxql , flux	3	1881	February 9, 2022
Export data from influxDB 2.0 to csv file or directly to another bucket Tasks influxdb	7	7475	November 30, 2023
Transform data from one bucket to another influxdb , flux	3	3045	May 10, 2022
Copy large buckets using to() function InfluxDB 2 flux	2	76	December 30, 2024

Move the data from one bucket to another bucket

Related topics