Performance of contains() is very bad compared to equivalent alternatives. Same thing for regex.compile()

I have many queries that I use in Grafana where I need to match based on a list of measurement names. In order to do this, I use a query like this:

measurement_names = [
  "some_measurement_name"
, ...
]

from(bucket: "flywheel")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => contains(value: r._measurement, set: measurement_names))
  |> group(columns: ["_measurement"])
  |> aggregateWindow(every: 1h, fn: sum)

However, the performance is terrible. Just to demonstrate…

this takes many seconds to run and if run on more than a couple days, will time out

measurement_names = [
  "some_measurement_name" 
]

from(bucket: "flywheel")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => contains(value: r._measurement, set: measurement_names))
  |> group(columns: ["_measurement"])
  |> aggregateWindow(every: 1h, fn: sum)

Where this runs in milliseconds. The only difference is the use of contains()

from(bucket: "flywheel")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "some_measurement_name")
  |> group(columns: ["_measurement"])
  |> aggregateWindow(every: 1h, fn: sum)

Likewise… this query takes a long time

import "regexp"

measurement_names = "/some_measurement_name/"

from(bucket: "flywheel")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] =~ regexp.compile(v: measurement_names))
  |> group(columns: ["_measurement"])
  |> aggregateWindow(every: 1h, fn: sum)

And this (seemingly exactly same) query takes a short time. The only difference is the use of regexp.compile

from(bucket: "flywheel")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] =~ /some_measurement_name/)
  |> group(columns: ["_measurement"])
  |> aggregateWindow(every: 1h, fn: sum)

The reason why I include the regex.compile() approach is that it could technically also be used to provide a list or to match on multiple options. I am trying to use variables for the sake of code clarity and portability. Though its a much hackier solution that I don’t prefer, especially if it isn’t performant.

Is there a performant way to do this that isn’t hacky? The first method with an array of values is the most ideal, aside from the terrible performance.

1 Like

@Anaisdg would you happen to have any insight into this?

Ultimately, what I’d like is to be able to specify a list of measurements and for it to work at the same speed (in the same way?) as if I had named them “directly”.

I.e.

measurement_names = [
   "my_measurement.1"
,  "my_measurement.2"
,  "my_measurement.3"
]

from(bucket: "some_bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => contains(value: r._measurement, set: measurement_names))
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

to execute the same as

from(bucket: "some_bucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "clean_history.archive" 
    or r["_measurement"] == "my_measurement.1"
    or r["_measurement"] == "my_measurement.2"
    or r["_measurement"] == "my_measurement.3"
    )
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> yield(name: "mean")

Is there any way?

1 Like

@alexitheodore,
First thing I’m noticing is that the group is redundant. By default it will be grouped by measurement name.
Does the number of items in your list vary? You could reference the items directly.

  |> filter(fn: (r) => r["_measurement"] == "clean_history.archive" 
    or r["_measurement"] == measurement_names[0]
    or r["_measurement"] == measurement_names[1]
    or r["_measurement"] == measurement_names[2]
    )

other than that i’m not aware of an alternative. @scott am I missing something?

No, this has been a long-standing problem in Flux. The poor performance of contains is a known issue. There is a link in that issue to a Grafana thread that may help to solve your issue @alexitheodore: