Long filter clause

danielgonnet · September 26, 2022, 12:28pm

Hi everyone,

My team and I are building a solution for our customers that include the ability for them to create data exports for the data that our solution gathers from heating controllers.

Some of this customers have hundreds (or thousands) of controllers and they want to export a (somewhat big) subset of them, let’s say hundreds.

We currently generate a flux query that contains a long list of tags (hundreds of possible values) as a filter. Something along this lines:

data = from(bucket: "prod_readouts/high")
    |> range(start: time(v: "2021-05-01T22:00:00.000Z"), stop: time(v: "2021-05-07T21:59:59.999Z"))
    |> filter(fn: (r) => r._measurement == "{measurement_for_customer}")
    |> filter(fn: (r) => r.deviceId == "{device_1}" or r.deviceId == "{device_2}" or ....
... //more flux

The structure of our data is a measurement per customer and a single tag per device (deviceId).

When we try to execute such query, we get an error from the server (we get the same error in our application as we are getting in the Data Explorer UI):

compilation failed: error @4:26-4:854: Program is nested too deep error @4:858-4:887: Program is nested too deep error @4:891-4:901: Program is nested too deep error @4:905-4:913: Program is nested too deep error @4:917-4:918: Program is nested too deep error @4:919-4:927: Program is nested too deep

We found that if we create a query filtering, let’s say 100 devices, the query works and returns the data we need. When the number of devices is increased, the error comes.

Do we have any other alternative than executing multiple queries and “stitch” the results back for every kind of query that can potentially have a long list of devices included?

Thanks in advance

scott · September 26, 2022, 4:47pm

@danielgonnet You have a few options here, but there are downsides to each.

Option 1: Use contains to filter

The first solution is to use contains() to check for the existence of a value in an array. You could pass the list of device IDs as an array and check for the existence of those IDs in a filter:

deviceIDs = ["device_1", "device_2", "device_3", ... ]

data = from(bucket: "prod_readouts/high")
    |> range(start: time(v: "2021-05-01T22:00:00.000Z"), stop: time(v: "2021-05-07T21:59:59.999Z"))
    |> filter(fn: (r) => r._measurement == "{measurement_for_customer}")
    |> filter(fn: (r) => contains(set: deviceIDs, value: r.deviceId))
    // ... more flux

The downside of this is that there are known performance issues with contains():

Option 2: Programmatically build streams and union them together

The 2nd option is only theoretical, so I’m not 100% sure it would work. It also depends on how you’re building your Flux queries. But, you could create multiple streams where each stream represents a filter of 10 (just throwing a number out there) device IDs. So you’d have to programatically build the Flux code for each stream, but if you’re able to do that, you could then union all the streams together:

baseData = () => from(bucket: "prod_readouts/high")
    |> range(start: time(v: "2021-05-01T22:00:00.000Z"), stop: time(v: "2021-05-07T21:59:59.999Z"))

stream0 = baseData() |> filter(fn: (r) => r.deviceId == "{device_1}" or r.deviceId == "{device_2}" or ....)
stream1 = baseData() |> filter(fn: (r) => r.deviceId == "{device_11}" or r.deviceId == "{device_12}" or ....) 
stream2 = baseData() |> filter(fn: (r) => r.deviceId == "{device_21}" or r.deviceId == "{device_22}" or ....) 
stream3 = baseData() |> filter(fn: (r) => r.deviceId == "{device_31}" or r.deviceId == "{device_32}" or ....) 
// ...

union(tables: [stream0, stream1, stream2, stream3, ...]

Downsides here are that there is additional programatic burden in building the Flux code AND I’m not 100% sure it would work. I think it would, but I haven’t tested it.

danielgonnet · September 27, 2022, 5:37am

Uhm, I’ll give a shot to option 2. I’ll let you know how it goes. Did not think of union multiple streams, but if it works (and I cannot think why not from the top of my head) it will also solve my headache for another query which results are being streamed and parallel execution of multiple queries is not an option.

I have had performance problems with contains in the past so I think in this case it will hurt (query performance) as well as help (remove limit of long filter).

I’ll report how it works that second option.
Thanks a lot.

danielgonnet · September 27, 2022, 2:16pm

I have been doing some tests on Option 2 and it works.
It is a bit of a hassle to generate the query and process the response fields with the new name.
And we are “union”-ing more tables than before, but I don’t expect this to be a performance problem.

But it works as far as I can see in my tests. Thanks a lot for the input.

scott · September 27, 2022, 2:58pm

@danielgonnet Yeah, I figured it would be a hassle, but I’m glad it’s working. The Flux team is working through some pretty major performance enhancements now and I know the contains() issue is on their radar, but I don’t know where it falls on the priority list. It would definitely be the more graceful solution, but the longer the array you’re evaluating is, the worse the performance gets, so this will definitely hurt more than it helps in your case.

I’ve been trying to think through ways to do the option 2 approach in raw Flux, but I keep hitting some road blocks when I test. I’ll keep you posted.

danielgonnet · September 30, 2022, 9:26am

We have deployed to production Solution 2 and works like a charm. We’ll keep it until we hear improvements in contains() that would dramatically decrease the complexity logic in the generation of the flux scripts

Topic		Replies	Views
Large Filter Clause Performance InfluxDB 2 influxdb , schema , query , flux , performance	6	698	June 20, 2023
Convert query to FLUX Fluxlang	2	715	May 10, 2022
Nested query runs out of memory Fluxlang query , performance	0	510	April 22, 2022
Problems migrating InfluxQL query to Flux Fluxlang influxdb , influxql , query , flux , performance	1	960	July 13, 2021
Flux query: Problem with array filter inside map InfluxDB 2 query	1	1082	April 26, 2023

Long filter clause

Option 1: Use contains to filter

Option 2: Programmatically build streams and union them together

Related topics