Automatic InfluxDB Bucket Name Generation in telegraf

Hi All,
I am using InfluxDB 2.0.9 OSS version.
Right now, I am putting all my data in one bucket but over the time this bucket contains too much data and query the data becomes too slow.

Now, I want to put my data into different bucket based on the month and year.
Like I want to create the bucket name as:

012023
022023
032024
042023
.
.
.
122023
012024
[[outputs.influxdb_v2]]
 ## Test bucket
 urls = ["http://influxdb:8086"]
 tagexclude = ["data_type","device_type","host","operator"]
 namedrop = ["exe_*"]
 fielddrop = ["time","date"]
 token = "${TOKEN}"
 organization = "operator"
 bucket = "bucket_name"
 flush_interval = "10s"
 metric_buffer_limit = 120000
 [outputs.influxdb_v2.tagpass]
  data_type = ["production"]

In this output plugin I want to generate bucketname automatically using telegraf and telegraf should take month and year from the underlying system.

Is there any plugin exits that we can use to achieve the same or any other approach we can follow so that we can segregate our data based on month and year?

Thanks

@Jay_Clifford @Anaisdg could please help me with this?
Thanks

If you want to automate it client side use cron, assume you are also batch creating these on the influxd server via cli…

i’m not convinced with the approach tho’, server side scraping / database pruning / is likely a more robust and adaptable approach for most. your method pretty much takes the time series out of database and gets you lots of databases named after time.

Most of the time you use one-month data to query and report. So, keeping data older than 1 month in one bucket going to increase the necessary query time and CPU consumption.
So I was thinking of the approach where we can keep the data month-wise in monthly buckets.

Hi @Ravikant_Gautam,
So you could create a task which extracted the current month from the _time and used that to route the bucket:

import "date"
import "influxdata/influxdb/tasks"
bucketName = string(v:date.month(t: now()))

rawdata = () => from(bucket: "plantbuddy")
    |> range(start: tasks.lastSuccess(orTime: -task.every))
    |> filter(fn: (r) => r._measurement == "sensor_data")

rawdata() |> to(bucket: bucketName )

You have a soft bucket limit as this can also impact performance so I would not create a bucket for each month of the year.

Thanks @Jay_Clifford I have achieved the same using telegraf configuration using the time column coming in the data.
I pasted my snipped of telegraf.conf so it can help other people.

[[processors.starlark]]
source ='''
load('time.star', 'time')
def apply(metric):
    new_time = metric.time
    # print(new_time)
    new_date = time.from_timestamp(int(new_time / 1e9))
    # print(new_date)
    new_date = str(new_date)
    new_date_split = new_date.split()
    # print(new_date_split[0])
    year = str(new_date_split[0]).split("-")[0]
    month = str(new_date_split[0]).split("-")[1]
    # print(year)
    # print(month)
    var_bucket_name = year + month
    print(var_bucket_name)
    metric.tags["bucketname"] = var_bucket_name

    return metric

'''
[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086"]
  token = "token"
  organization = "org"
  bucket_tag = "bucketname"
1 Like

Awsome @Ravikant_Gautam,
I will get this added to the Telegraf-community-config repo if you don’t mind

Sure. Go ahead. I don’t mind.
Please share the URL after updating :smile:

1 Like

Here you go! Still need to add your contribution to the readme but you are contributed in the config: Telegraf-Community-Configs/telegraf-combine-array-to-string.conf at master · InfluxCommunity/Telegraf-Community-Configs · GitHub