Reduce vs max&min&mean

han · October 18, 2024, 6:42am

It seems that using max(), min(), and mean() functions is much faster than using reduce to calculate max, min, and mean. I wonder if the performance difference is because reduce rewrites the result at each iteration, causing it to be slower. Could you explain why there is such a difference?

Reference URL: Combined min, max, mean and sum query - best way? - #2 by scott

scott · October 21, 2024, 2:44pm

@han Great question! And yes, your assumption is one of the reasons why the reduce() method is so much slower. The other reason is where the computation takes place. Flux and InfluxDB can work together to make operations faster by allowing Flux to “push down” certain operations to the storage tier (closer to where the data lives) where operations happen much faster. For operations that can’t be “pushed down,” all the data needed for the operation has to be loaded into the Flux memory space and operated on there. Loading the data into memory has some inherent latency, but also, operations are slower in memory than they are in storage.

The Optimize Flux queries documentation covers what functions and function combinations can be pushed down to storage.

So structuring a min/max/mean query like this leverages pushdowns to calculate the min, max, and mean at the storage tier before loading the data into Flux memory and union’ing the results of the three streams together:

data = () =>
    from(bucket: "example-bucket")
        |> range(start: -1d)
        |> filter(fn: (r) => r._measurement == "example-measurement" and r._field == "example-field")

min = data() |> aggregateWindow(every: 1h, fn: min) |> set(key: "_field", value: "min")
max = data() |> aggregateWindow(every: 1h, fn: max) |> set(key: "_field", value: "max")
mean = data() |> aggregateWindow(every: 1h, fn: mean) |> set(key: "_field", value: "mean")

union(tables: [min, max, mean])
    |> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")

Also note that I structure the data() “variable” as a function to keep the pushdown chain intact across an identifier declaration, also known as a “thunk” (more info here).

reduce() can’t be pushed down to storage, so all the queried data has to be loaded into memory and iterated on there. There’s a compounding effect here:

You have to load all the raw, unaggregated data into memory to aggregate it with reduce().
There are more rows to iterate over.
It takes longer to iterate over each row in memory than it does to aggregate values in storage.

Hopefully this helps.

han · October 22, 2024, 6:18am

Thank you. That was very helpful.

Topic		Replies	Views
Please explain, slow vs fast flux query InfluxDB 2 flux , performance	0	601	November 25, 2022
InfluxQL vs Flux performance on Influx 1.8 Fluxlang influxdb , chronograf , influxql , flux , performance	1	911	August 10, 2021
Query is too slow	0	1078	July 27, 2017
Combined min, max, mean and sum query - best way? Fluxlang	2	3295	January 10, 2020
Slow window function Fluxlang	3	646	May 23, 2022

Reduce vs max&min&mean

Related topics