Impact of contains() on performance

MzazM · November 19, 2020, 6:14pm

I think there is something wrong with the function contains().
Query 1 does not use contains() and takes 0.15s.

    from(bucket: "short")
    |> range(start: v.timeRangeStart,stop: v.timeRangeStop)
    |> filter(fn: (r)=> r["_measurement"] == "Phasor")
    |> filter(fn: (r) => r["UID"] =~ /RepDev0002/)  
    |> filter(fn: (r)=> r._field == "mag")
    |> aggregateWindow(every: 20ms, fn: mean, createEmpty: true, timeSrc: "_start")
    |> group()

Query 2, provides identical output but takes 4.2s . I just replaced the filter with regex with contains()

MeasUID = ["RepDev0002-IA1","RepDev0002-IB1","RepDev0002-IC1","RepDev0002-IA2","RepDev0002-IB2","RepDev0002-IC2","RepDev0002-VA1","RepDev0002-VB1","RepDev0002-VC1"]
from(bucket: "short")
|> range(start: v.timeRangeStart,stop: v.timeRangeStop)
|> filter(fn: (r)=> r["_measurement"] == "Phasor")
|> filter(fn: (r)=> contains(value: r.UID,set: MeasUID))
|> filter(fn: (r)=> r._field == "mag")
|> aggregateWindow(every: 20ms, fn: mean, createEmpty: true, timeSrc: "_start")
|> group()

Obviously the contains() has the advantage of allowing the user to have an array coming from another query or a template variable, so it is more flexible. But I am wondering if a 30x execution time is justifiable.
Maybe I am using it wrongly or there is another way to make a dynamic filtering (without using regex as in query 1)?
Thanks!

Anaisdg · November 24, 2020, 8:24pm

Hello @MzazM,
Welcome! Thank you for sharing your question and your patience. I don’t know the answer. I’m passing this along to the Flux team. Interested to see what they say.

MzazM · November 26, 2020, 9:28am

Hello @Anaisdg, thank you for answering.
In the #flux channel here Nathaniel Cook mentioned that currently the implementation of contains() prevents the filter from being evaluated on the storage, leading to the poor performance.

Following on that, I would like to know:

is there an ETA for improving contains()
is there workaround to filter a table according to the content of a dynamic array? Note that I do not know the content of MeasUID as it comes as output from sql.from. Basically, is there a way to expand the array to become something like this:?
|> filter(fn: (r) => r["UID"] =~ ${MeasUID} or
|> filter(fn: (r) => r["UID"] =~ /RepDev0002-VA1|RepDev0002-VA2|...|RepDev0002-VAN/ (N is not known and other types of UIDs could be in the array).

MzazM · December 6, 2020, 2:48pm

Hello @Anaisdg, any feedback on this from flux team?

Anaisdg · December 7, 2020, 4:45pm

Hello @MzazM,
They would have answered here first. Let me see if I can go bug someone Thanks for your patience.

MzazM · January 22, 2021, 5:07pm

Hello, we keep having issues with the poor performance of contains(),
do you you know if there is a possibility that it will ever be a push-down function like filter and others?

Anaisdg · January 22, 2021, 5:50pm

Hello @MzazM,
I haven’t received any feedback. I’ve included your question in an issue again (#3445). I’m sorry I don’t have any information. All the pushdown patterns that I’m aware of are listed in this blog:

Thank you.

MzazM · October 15, 2021, 9:04pm

Hi finally, I found online a workaround that can be used to replace contains when using grafana. Here it is: Grafana + InfluxDB Flux - query for displaying multi-select variable inputs - InfluxDB - Grafana Labs Community Forums

Topic		Replies	Views
Which one is fast, contains vs equal expression? InfluxDB 2	1	619	March 22, 2022
Performance of contains() is very bad compared to equivalent alternatives. Same thing for regex.compile() regex , performance	3	169	July 8, 2024
Flux performance compared to similar influxQL query Fluxlang performance	5	579	December 9, 2022
findColumn results in contains Fluxlang	4	628	March 22, 2022
Contains query performance (Finding an alternative) influxdb , query , flux , performance	0	579	February 9, 2023

Impact of contains() on performance

Related topics