Impact of contains() on performance

I think there is something wrong with the function contains().
Query 1 does not use contains() and takes 0.15s.

    from(bucket: "short")
    |> range(start: v.timeRangeStart,stop: v.timeRangeStop)
    |> filter(fn: (r)=> r["_measurement"] == "Phasor")
    |> filter(fn: (r) => r["UID"] =~ /RepDev0002/)  
    |> filter(fn: (r)=> r._field == "mag")
    |> aggregateWindow(every: 20ms, fn: mean, createEmpty: true, timeSrc: "_start")
    |> group()

Query 2, provides identical output but takes 4.2s . I just replaced the filter with regex with contains()

MeasUID = ["RepDev0002-IA1","RepDev0002-IB1","RepDev0002-IC1","RepDev0002-IA2","RepDev0002-IB2","RepDev0002-IC2","RepDev0002-VA1","RepDev0002-VB1","RepDev0002-VC1"]
from(bucket: "short")
|> range(start: v.timeRangeStart,stop: v.timeRangeStop)
|> filter(fn: (r)=> r["_measurement"] == "Phasor")
|> filter(fn: (r)=> contains(value: r.UID,set: MeasUID))
|> filter(fn: (r)=> r._field == "mag")
|> aggregateWindow(every: 20ms, fn: mean, createEmpty: true, timeSrc: "_start")
|> group()

Obviously the contains() has the advantage of allowing the user to have an array coming from another query or a template variable, so it is more flexible. But I am wondering if a 30x execution time is justifiable.
Maybe I am using it wrongly or there is another way to make a dynamic filtering (without using regex as in query 1)?
Thanks!

1 Like

Hello @MzazM,
Welcome! Thank you for sharing your question and your patience. I don’t know the answer. I’m passing this along to the Flux team. Interested to see what they say.

Hello @Anaisdg, thank you for answering.
In the #flux channel here Nathaniel Cook mentioned that currently the implementation of contains() prevents the filter from being evaluated on the storage, leading to the poor performance.

Following on that, I would like to know:

  • is there an ETA for improving contains()
  • is there workaround to filter a table according to the content of a dynamic array? Note that I do not know the content of MeasUID as it comes as output from sql.from. Basically, is there a way to expand the array to become something like this:?
    |> filter(fn: (r) => r["UID"] =~ ${MeasUID} or
    |> filter(fn: (r) => r["UID"] =~ /RepDev0002-VA1|RepDev0002-VA2|...|RepDev0002-VAN/ (N is not known and other types of UIDs could be in the array).
1 Like

Hello @Anaisdg, any feedback on this from flux team?

Hello @MzazM,
They would have answered here first. Let me see if I can go bug someone :stuck_out_tongue: Thanks for your patience.

Hello, we keep having issues with the poor performance of contains(),
do you you know if there is a possibility that it will ever be a push-down function like filter and others?

Hello @MzazM,
I haven’t received any feedback. I’ve included your question in an issue again (#3445). I’m sorry I don’t have any information. All the pushdown patterns that I’m aware of are listed in this blog:

Thank you.

Hi finally, I found online a workaround that can be used to replace contains when using grafana. Here it is: Grafana + InfluxDB Flux - query for displaying multi-select variable inputs - InfluxDB - Grafana Labs Community Forums

1 Like