Query with lots of fields takes orders of magnitude higher in flux than influxql

tintin · March 15, 2021, 9:52am

I’ve been using this query on InfluxV1 and InfluxV2(on compatibility endpoint) and it takes a second or two.

SELECT 
mean("signal") * mean("fft_0001"),
mean("signal") * mean("fft_0002"),
mean("signal") * mean("fft_0003"),
mean("signal") * mean("fft_0004"),
....
mean("signal") * mean("fft_0510"),
mean("signal") * mean("fft_0511"),
mean("signal") * mean("fft_0512") FROM "data" WHERE "device" = 'XY' AND $timeFilter GROUP 
BY time($__interval) fill(null)

But a similar query in FluxQL takes order of magnitude higher and simply times out if I increase the time range. Is there potential to improve this query and bring it at par with the influxql one ?

import "strings"
filtered = from(bucket: v.defaultBucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and and r["device"] == "XY")

signal = filtered
  |> filter(fn: (r) => r._field == "signal")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)

fft = filtered
  |> filter(fn: (r) => strings.hasPrefix(v: r._field, prefix: "fft_"))
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)

join(tables:{signal:signal,fft:fft}, on:["_time"])
  |> map(fn: (r) =>({ _value: float(v:r._value_signal) * float(v:r._value_fft),  _time: r._time }))
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

tintin · March 29, 2021, 4:52pm

I’ve tried it on a beefier machine and still the same.

Tried naming all fields instead of using prefix: "fft_" and that takes even longer.

Anaisdg · March 31, 2021, 8:08pm

Hello @tintin,
I’m not sure. I’m sorry you’re having a bad experience. I’m sharing your question with the Flux team directly and I hope someone can provide some insight. Thank you.

jonathan · April 21, 2021, 8:20pm

Hi, I think the reason might be that your query isn’t utilizing the push down optimizations.

import "experimental"

// Use a function wrapper so we don't have to duplicate this and it will be registered
// in the runtime as two separate from calls instead of one. A potential future improvement
// is for the flux planner to recognize this pattern and do it automatically.
select = () => from(bucket: v.defaultBucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and and r["device"] == "XY")

signal = select()
  |> filter(fn: (r) => r._field == "signal")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)

fft = select()
  // regexes are specially understood while function calls are not, possible future optimization
  |> filter(fn: (r) => r._field =~ /^fft_/)
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)

join(tables:{signal:signal,fft:fft}, on:["_time"])
  |> map(fn: (r) =>({ _value: float(v:r._value_signal) * float(v:r._value_fft),  _time: r._time }))
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

I think this query should perform better as it forces the push downs to be used. Please give it a try and I can try to tweak it more or create a performance improvement issue if one of the other methods is what’s causing the problem. I also suspect that the pivot at the end doesn’t do anything so you can probably safely remove it. If you can give me an idea of the output you expect, I can help you tweak that part for the correct output.

tintin · May 7, 2021, 6:29pm

Hi @jonathan ,
I tried it and unfortunately there was no noticeable improvement.

The data I have is 1Hz. (1 row per second)

The following two queries run in less than a second, even if the range is as long as a month

select = () => from(bucket: v.defaultBucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and r["device"] == "XY")

signal = select()
  |> filter(fn: (r) => r._field == "signal")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield()

select = () => from(bucket: v.defaultBucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "data" and r["device"] == "XY")

fft = select()
  |> filter(fn: (r) => r._field =~ /^fft_/)
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield()

As soon as I introduce join the time shoots up.

join(tables:{signal:signal,fft:fft}, on:["_time"])
  |> yield()

Range	Time
5 minutes	4 seconds
15 minutes	8 seconds
1 hour	Times out ~ 60 seconds

InfluxQL times for comparison

Range	Time
5 minutes	Subsecond
15 minutes	1 second
1 hour	2 seconds
1 month	8 seconds

jonathan · May 7, 2021, 7:53pm

I think this might be a problem with the speed of join() unfortunately. I think join is probably the best function for what you want to do since you want to join one column with many other columns, but the amount of data seems to be causing a problem.

If it’s possible, do you think you can share any of the data you are using? It would allow me to run the queries myself and analyze where we can improve the performance.

Are you doing this work on cloud 2.0? If you are, I may have another idea for a query that might improve the performance, but it requires a code change.

tintin · May 8, 2021, 1:44pm

I am using OSS v2. I will try to get you the data on Monday.

tintin · May 20, 2021, 10:33am

Here is some data: https://we.tl/t-OipqqiF3rz

Tags: device and channel.
Fields: signal and fft_xxxx

tintin · January 21, 2022, 8:32am

@Anaisdg InfluxQL still performs better than Flux for my use case. Can you confirm what the future of InfluxQL is. I see another unanswered post here regarding the same here : InfluxQL roadmap in InfluxDB 2 (will it be deprecated?).

aldas · January 21, 2022, 8:46am

Adding relevant Github issue here for history sake Performance difference between Flux and InfluxQL · Issue #18088 · influxdata/influxdb · GitHub

Topic		Replies	Views
Migrating FluxQL with multiple aggregations to Flux Fluxlang influxql , flux , performance , join	7	1492	July 9, 2021
Flux performance compared to similar influxQL query Fluxlang performance	5	579	December 9, 2022
Query performance InfluxDB 2 query , flux , performance	4	585	June 12, 2024
[URGENT] Flux vs InfluxQL Groupby Speed InfluxDB 2 flux	7	579	August 30, 2023
Why is Flux query taking time than same InfluxQL query Fluxlang influxdb , query	1	354	June 25, 2023

Query with lots of fields takes orders of magnitude higher in flux than influxql

Related topics