Automatic aggregate monthly

Hello!

I write the data from my powermeter in the influxDB and display with Grafana.
But the function “aggregateWindow(every: 1mo, fn: last, createEmpty: true)” takes a loooong time. Because there are many data.

Is it possible that InfluxDB create automatic from a query a new measurement with the already aggregated datas?

Then I just need to read 12 values (Jan - Dec) and must not calculate at any query.

Thank you!

seems this one is what I search?:

@hanhoe You should create an InfluxDB task that runs monthly and aggregates the data from the last month. The task can then store the data in a aggregated bucket that you can query from. Something like this:

option task = {
  name: "Aggregate data monthly",
  every: 1mo
}

from(bucket: "example-bucket")
    |> range(start: -task.every)
    |> filter(fn: (r) => r._measurement == "example-measurement")
    |> aggregateWindow(every: task.every, fn: last, createEmpty: true)
    |> to(bucket: "example-aggregate-bucket")

So this task would run every month, query and aggregate data from the previous month, and then store it in the other aggregate bucket. Then, to query a year’s worth of just the monthly aggregates, you’d just do:

from(bucket: "example-aggregate-bucket")
    |> range(start: -1y)
    |> filter(fn: (r) => r._measurement == "example-measurement")

cool … yes, something like this I`m thinking.

There is maybe a second option:
At least, I just need just 12 single values:
01.01.2023, value
01.02.2023, value
01.03.2023, value

So, always the first value for each month.
Then just the difference
Maybe you have for this a solution?

I can add 12 single query, but then I need to put this at end together?

Doing that will likely take just as long as your first attempt. It still has to query all the data from the last year to generate the aggregate values for each month.

I thought something like that:

import "timezone"
import "date"
option location = timezone.location(name: "Europe/Vienna")
startDate = date.truncate(t: today(), unit: 1y)

seven = from(bucket: "house")
  |> range(start: date.add(d: 6mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 6mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

eight = from(bucket: "house")
  |> range(start: date.add(d: 7mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 7mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

nine = from(bucket: "house")
  |> range(start: date.add(d: 8mo, to: startDate), stop: date.add(d: 1d, to: date.add(d: 8mo, to: startDate)))
  |> filter(fn: (r) => r["_measurement"] == "PV")
  |> filter(fn: (r) => r["Powermeter"] == "Hauptzaehler")
  |> filter(fn: (r) => r["_field"] == "Total Geliefert" or r["_field"] == "Totaler Bezug")
  |> filter(fn: (r) => r["_value"] > 0)
  |> first()
  |> timeShift(duration: -1s, columns: ["_time"])

sum = union(tables: [seven, eight, nine])
    |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")

sum
//  |> difference(nonNegative: false)
  |> yield()


as for example for 3 month
only the difference is not working

I read again your post and maybe there is mistake by my side:

The values are continues. So I need only the difference between 1. January and 1. February
Not need to calc a average

Hello scott!

When I use:
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
he need around 1s

When I use:
|> aggregateWindow(every: 1d, fn: last, createEmpty: false)
he need around 10s

Did you know why last has this delay?
With mean he need to read all values and calc the average
With last or first he “just” need read the last/first value

Why is mean so fast?

I don’t know the exact implementation details of these two functions, but they are inherently different. mean() is an aggregate function that returns the average of all values in a specific column in each input table. last() is a selector function function that returns the last row in each input table. If I had to guess, I would say that last() may take a little longer is it has to ensure sort order in its logic, but that’s only a guess.