Pushdown combinations: aggregateWindow with last

Hi Team, @Anaisdg , @scott

Is using aggregateWindow with last now part of the pushdown combinations?

An article from MAY22 mentions that:

Previously the aggregateWindow() function was written in Flux. Now it’s implemented in native Go for specific functions. This change means that the aggregateWindow() function will be even more performant if you’re using it to calculate the following aggregations:

  • mean
  • sum
  • count
  • (min and max are expected to land shortly)

@ajetsharwin Yes, I believe it should be. Although aggregateWindow() isn’t listed in the supported pushdown table, the primitives that power aggregateWindow() are: Optimize Flux queries | InfluxDB OSS 2.7 Documentation

1 Like

thank you for letting me know Scott.

@scott @Anaisdg I don’t think this is correct. Having min, max, first, or last as the aggregation function in aggregateWindow is orders of magnitude slower than sum, mean or count. I’ve tested this on a current InfluxDB docker image (v2.7.10). It can also be seen in profiling (mean vs max):

table_profiler _measurementgroupstring Countno grouplong DurationSumno grouplong Labelno groupstring MaxDurationno grouplong MeanDurationno groupdouble MinDurationno grouplong Typeno groupstring
1 profiler/operator 1 112187540 merged_ReadRange9_filter2_filter3 112187540 112187540 112187540 *influxdb.readFilterSource
1 profiler/operator 7 102720207 aggregateWindow8 93667782 14674315.285714285 1443 *universe.aggregateWindowTransformation

vs.

table_profiler _measurementgroupstring Countno grouplong DurationSumno grouplong Labelno groupstring MaxDurationno grouplong MeanDurationno groupdouble MinDurationno grouplong Typeno groupstring
1 profiler/operator 1 3831573990 merged_ReadRange8_filter2_filter3 3831573990 3831573990 3831573990 *influxdb.readFilterSource
1 profiler/operator 7 3891713582 window4 3590121411 555959083.1428572 2685 *universe.fixedWindowTransformation
1 profiler/operator 107 4352358 max5 1097780 40676.2429906542 892 *execute.rowSelectorTransformation
1 profiler/operator 107 292743 duplicate6 30237 2735.9158878504672 1132 *universe.schemaMutationTransformation
1 profiler/operator 107 555263 window7 66777 5189.373831775701 551 *universe.fixedWindowTransformation

The performance of those functions might be affected by the amount of data being queried.
But it could still be a pushdown and not be as performant as some of the others. Being able to execute fast scans across datasets and get things like min and max is a long known pain point of v2 though, this is one contributing factor to the rewrite in v3. Is to address problematic
queries exactly like this.

I would also add that min, max, first, and last are all selector functions that have to evaluate rows against all other rows in the table(s) they operate on, so they are more compute-heavy. sum, mean, and count are all aggregate queries where the computation is simpler.

@neban Thanks for the profiler output. Could you also provide the queries your running to get these profiles? Also, what version of InfluxDB are you using?