Delete data without losing information

s118 · January 7, 2023, 2:45am

Hello. I have a database of energy consumed and energy produced in my house. These data are obtained from sensors that take the values and are entered into influxdb2 every 10 seconds. Everything works correctly, but I think that with the passage of time, the size occupied by the database will be very large.
I am thinking of reducing the size of the database as time goes on. For example, once the data is 6 months old, I could delete the energy data collected every 10 seconds, and just keep the hourly average. When the data is one year old, eliminate the hourly average and keep the daily average.
Would this be possible? how could it be done?. Thanks

h5py · January 7, 2023, 10:27am

Maybe this will help you:

Downsampling

s118 · January 7, 2023, 11:57am

I think I have understood it: by means of a query I move the data to another bucket, in which I save the average of the original data every hour. This is done by a task. So far it’s easy.
Some questions: every time the query is executed, do I bring all the existing data from the source bucket to the destination bucket, or only a part of it? How often do you need to run the task?. Does the data in the destination bucket retain the original date, or the date it is entered into this bucket?

Anaisdg · January 9, 2023, 8:08pm

Hello @s118,

every time the query is executed, do I bring all the existing data from the source bucket to the destination bucket, or only a part of it?
You can bring all of the source data into a destination bucket, but if you apply any other function to it you’ll only be bringing the end result to your destination bucket.
How often do you need to run the task?
This depends on what type of downsampling you’re trying to do and your ingest rate. In general I’d ask yourself, "whats the lowest resolution data that I’m happy with? to determine what period to use for the aggregateWindow() function. Then I’d ask “how quickly do I want that downsampled data to exist?”. For example if you
Does the data in the destination bucket retain the original date, or the date it is entered into this bucket?
You can create any sort of task that you want with Flux. So you can retain the original date, enter a new date, create a future date…it’s up to you! If you’re using the UI to create task then the original date will be preserved. I think reading this might be helpful:
Tasks | Time to Awesome
You might also enjoy InfluxDB University:
InfluxDB Essentials Tutorial Course | InfluxDB University

Let me know if that helps of if you want more detail in any area.

s118 · January 10, 2023, 8:55am

I think I have understood everything perfectly. If I don’t transform the data, I move ALL the data to the new bucket with its original date. For example, with this query, it would not transform them and would return the average of the data every 10 minutes. In total you would get 6 energy data with its original date per hour:

option task = {name: "DOWNSAMPLING", every: 1h, offset: 5s}
data =
    from(bucket: "energy")
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "ENERGY")
data
    |> aggregateWindow(every: 10m, fn: mean, createEmpty: false)
    |> to(bucket: "energy_down", org: "house")

Topic		Replies	Views
Downsampling after time possible? InfluxDB 2	0	97	January 13, 2025
Data Aggregation/Reduction and TimeStamp? Welcome & Getting Started	9	2146	March 12, 2021
Downsampling existing data influxdb	3	7523	June 8, 2017
Cancel long running query InfluxDB 2 influxdb , query , downsample	0	575	February 2, 2023
Move whole bucket to a new onw with lower data resolution	1	236	November 16, 2023

Delete data without losing information

Related topics