Out of memory with aggregateWindow and spread function

Gonzo4 · November 8, 2023, 9:22pm

Hi,

I have created a task which runs after midnight to calculate the power consumption of the day before of several power meters => spread function. No problem so far…

But now I want to calculate this daily spread value initially for the last ~2 years with the following code:

from(bucket: “power_full”)
|> range(start: 2021-01-01T00:00:00Z, stop: today())
|> filter(fn: (r) => r[“_measurement”] == “Energy”)
|> aggregateWindow(every: 24h, fn: spread, createEmpty: false)
|> to(bucket: “power_daily”)

With this code I always run out of memory. If I change the function to mean it is done in about 2 minutes. I think this is because the spread function is not supported for a push down? I already tried the min and max function => also out of memory.

Time is not the problem here because it is only a initial run to prepare the bucket. Or is it possible to calculate it month by month or something like that in a loop?

Gonzo4 · November 10, 2023, 9:30pm

Hi,
I tried to calculate month by month with the following code, but the task is still running out of memory
Is there something like a dispose which could be called between the aggregateFunctions calls?

import "timezone"
import "strings"
import "date"

option task = {name: "power_daily_rebuild", every: 876456h0m0s, offset: 20m}

option location = timezone.location(name: "Europe/Berlin")

targetBucket = "power_daily"
sourceBucket = "power_full"

aggregateFunctions = (month) => {
    startTime = date.truncate(t: month, unit: 1mo)
    endTime = date.add(d: 1mo, to: startTime)
    data =
        from(bucket: "Unimoc")
            |> range(start: startTime, stop: endTime)
            |> filter(fn: (r) => r["_measurement"] == "Energy")

    //    data
    //        |> aggregateWindow(every: 24h, fn: mean, createEmpty: false)
    //        |> set(key: "fn", value: "mean")
    //        |> to(bucket: targetBucket)
    //    data
    //        |> aggregateWindow(every: 24h, fn: min, createEmpty: false)
    //        |> set(key: "fn", value: "min")
    //        |> to(bucket: targetBucket)
    //    data
    //        |> aggregateWindow(every: 24h, fn: max, createEmpty: false)
    //        |> set(key: "fn", value: "max")
    //        |> to(bucket: targetBucket)
    data
        |> aggregateWindow(every: 24h, fn: spread, createEmpty: false)
        |> set(key: "fn", value: "spread")
        |> to(bucket: targetBucket)

    return 0
}

aggregateYear = (year) => {
    aggregateFunctions(month: date.add(d: year, to: 2020-01-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-02-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-03-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-04-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-05-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-07-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-08-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-09-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-10-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-11-01T00:00:00Z))
    aggregateFunctions(month: date.add(d: year, to: 2020-12-01T00:00:00Z))

    return 0
}

aggregateYear(year: 0y)
aggregateYear(year: 1y)
aggregateYear(year: 2y)
aggregateYear(year: 3y)

Anaisdg · November 15, 2023, 12:15am

Hello @Gonzo4,
Unfortunately, Flux is notorious for running out of memory. You might be better off using a Python Client library and doing the analysis that way. These types of issues with Flux is largely why the team rewrote the storage engine in v3. You can learn more about the performance benefits here if you’re curious.
InfluxDB 3.0 is up to 45x Faster for Recent Data Compared to InfluxDB Open Source | InfluxData.

Gonzo4 · November 15, 2023, 6:50pm

Hi,
yes, in the meantime I have written a simple Python script which does a request for each month.
=> Rebuild is done in a few minutes without memory issues

Just out of interest: Why was the push down not implemented for the spread/min/max function? OK, with the 3.0 coming up this is not relevant, but anyhow

Topic		Replies	Views
Nested query runs out of memory Fluxlang query , performance	0	514	April 22, 2022
aggregateWindow extremely slow and memory-hungry InfluxDB 2	2	2577	June 14, 2022
Automatic aggregate monthly InfluxDB 2 influxdb , aggregate	8	1173	December 7, 2023
Performance optimization, redundancy recommendations?	2	595	February 5, 2022
Flux custom "movingSpread" aggregate function InfluxDB 2 time-series , grafana , flux	1	569	December 16, 2021

Out of memory with aggregateWindow and spread function

Related topics