My team and I are building a solution for our customers that include the ability for them to create data exports for the data that our solution gathers from heating controllers.
Some of this customers have hundreds (or thousands) of controllers and they want to export a (somewhat big) subset of them, let’s say hundreds.
We currently generate a flux query that contains a long list of tags (hundreds of possible values) as a filter. Something along this lines:
The structure of our data is a measurement per customer and a single tag per device (deviceId).
When we try to execute such query, we get an error from the server (we get the same error in our application as we are getting in the Data Explorer UI):
compilation failed: error @4:26-4:854: Program is nested too deep error @4:858-4:887: Program is nested too deep error @4:891-4:901: Program is nested too deep error @4:905-4:913: Program is nested too deep error @4:917-4:918: Program is nested too deep error @4:919-4:927: Program is nested too deep
We found that if we create a query filtering, let’s say 100 devices, the query works and returns the data we need. When the number of devices is increased, the error comes.
Do we have any other alternative than executing multiple queries and “stitch” the results back for every kind of query that can potentially have a long list of devices included?
@danielgonnet You have a few options here, but there are downsides to each.
Option 1: Use contains to filter
The first solution is to use contains() to check for the existence of a value in an array. You could pass the list of device IDs as an array and check for the existence of those IDs in a filter:
Option 2: Programmatically build streams and union them together
The 2nd option is only theoretical, so I’m not 100% sure it would work. It also depends on how you’re building your Flux queries. But, you could create multiple streams where each stream represents a filter of 10 (just throwing a number out there) device IDs. So you’d have to programatically build the Flux code for each stream, but if you’re able to do that, you could then union all the streams together:
Downsides here are that there is additional programatic burden in building the Flux code AND I’m not 100% sure it would work. I think it would, but I haven’t tested it.
Uhm, I’ll give a shot to option 2. I’ll let you know how it goes. Did not think of union multiple streams, but if it works (and I cannot think why not from the top of my head) it will also solve my headache for another query which results are being streamed and parallel execution of multiple queries is not an option.
I have had performance problems with contains in the past so I think in this case it will hurt (query performance) as well as help (remove limit of long filter).
I’ll report how it works that second option.
Thanks a lot.
I have been doing some tests on Option 2 and it works.
It is a bit of a hassle to generate the query and process the response fields with the new name.
And we are “union”-ing more tables than before, but I don’t expect this to be a performance problem.
But it works as far as I can see in my tests. Thanks a lot for the input.
@danielgonnet Yeah, I figured it would be a hassle, but I’m glad it’s working. The Flux team is working through some pretty major performance enhancements now and I know the contains() issue is on their radar, but I don’t know where it falls on the priority list. It would definitely be the more graceful solution, but the longer the array you’re evaluating is, the worse the performance gets, so this will definitely hurt more than it helps in your case.
I’ve been trying to think through ways to do the option 2 approach in raw Flux, but I keep hitting some road blocks when I test. I’ll keep you posted.
We have deployed to production Solution 2 and works like a charm. We’ll keep it until we hear improvements in contains() that would dramatically decrease the complexity logic in the generation of the flux scripts