Automatic aggregate monthly

hanhoe · December 6, 2023, 1:12pm

Hello!

I write the data from my powermeter in the influxDB and display with Grafana.
But the function “aggregateWindow(every: 1mo, fn: last, createEmpty: true)” takes a loooong time. Because there are many data.

Is it possible that InfluxDB create automatic from a query a new measurement with the already aggregated datas?

Then I just need to read 12 values (Jan - Dec) and must not calculate at any query.

Thank you!

hanhoe · December 6, 2023, 2:38pm

seems this one is what I search?:

scott · December 6, 2023, 3:40pm

@hanhoe You should create an InfluxDB task that runs monthly and aggregates the data from the last month. The task can then store the data in a aggregated bucket that you can query from. Something like this:

option task = {
  name: "Aggregate data monthly",
  every: 1mo
}

from(bucket: "example-bucket")
    |> range(start: -task.every)
    |> filter(fn: (r) => r._measurement == "example-measurement")
    |> aggregateWindow(every: task.every, fn: last, createEmpty: true)
    |> to(bucket: "example-aggregate-bucket")

So this task would run every month, query and aggregate data from the previous month, and then store it in the other aggregate bucket. Then, to query a year’s worth of just the monthly aggregates, you’d just do:

from(bucket: "example-aggregate-bucket")
    |> range(start: -1y)
    |> filter(fn: (r) => r._measurement == "example-measurement")

hanhoe · December 6, 2023, 4:08pm

cool … yes, something like this I`m thinking.

There is maybe a second option:
At least, I just need just 12 single values:
01.01.2023, value
01.02.2023, value
01.03.2023, value
…

So, always the first value for each month.
Then just the difference
Maybe you have for this a solution?

I can add 12 single query, but then I need to put this at end together?

scott · December 6, 2023, 4:22pm

Doing that will likely take just as long as your first attempt. It still has to query all the data from the last year to generate the aggregate values for each month.

hanhoe · December 6, 2023, 4:33pm

I thought something like that:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

seven = from(bucket: "house")
  |> range(start: date.add(d: 6mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 6mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

eight = from(bucket: "house")
  |> range(start: date.add(d: 7mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 7mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

nine = from(bucket: "house")
  |> range(start: date.add(d: 8mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 8mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

sum = union(tables: [seven, eight, nine])
    |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")

sum
//  |> difference(nonNegative: false)
  |> yield()

as for example for 3 month
only the difference is not working

hanhoe · December 6, 2023, 5:06pm

I read again your post and maybe there is mistake by my side:

The values are continues. So I need only the difference between 1. January and 1. February
Not need to calc a average

hanhoe · December 7, 2023, 1:40pm

Hello scott!

When I use:
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
he need around 1s

When I use:
|> aggregateWindow(every: 1d, fn: last, createEmpty: false)
he need around 10s

Did you know why last has this delay?
With mean he need to read all values and calc the average
With last or first he “just” need read the last/first value

Why is mean so fast?

scott · December 7, 2023, 3:53pm

I don’t know the exact implementation details of these two functions, but they are inherently different. mean() is an aggregate function that returns the average of all values in a specific column in each input table. last() is a selector function function that returns the last row in each input table. If I had to guess, I would say that last() may take a little longer is it has to ensure sort order in its logic, but that’s only a guess.

Topic		Replies	Views
aggregateWindow for incomplete timeframes InfluxDB 2 influxdb	2	569	April 19, 2021
Calculate and display monthly Watt production InfluxDB 2 influxdb , grafana	7	1029	December 19, 2022
Monthly aggregation not working as expected InfluxDB 2 flux	3	637	January 10, 2022
Group data hourly, daily, monthly etc. how? InfluxDB 2	12	4475	January 23, 2021
Aggregates in InfluxDB iot , aggregate	6	798	September 14, 2021

Automatic aggregate monthly

Related topics