Query 12,000 series at once

ticchioned · April 21, 2022, 2:37pm

I have a database with around 12,000 series in it. Each series has a Value field where that specific series value is stored. I would like to be able to query any amount of those 12,000 series values at once. I know I can limit my flux filter to just a field filter, which would return every series’ value field however I need to be able to do something like that dynamically for any amount of series, not necessarily all 12,000. Also, adding 12,000 measurements (there are no tag key/values in this schema) to a filter is not a great option either… I’m sure a query like that would not be performant. From the research I’ve done it seems like this is not really a feature that influx is “built” for as of now and cross-series queries seem to be difficult. As of now I am querying each series independently however when you have to query 12,000 series individually it will inevitably take a long time. Are there any ways around an issue like this? Do I have to rethink my schema to be able to pull this off? Or is there a way to filter a large list of measurements in one query without losing performance?

mhall119 · April 21, 2022, 3:05pm

Hi @ticchioned,

I’m a little confused by what you’re explaining, can you show me an example of the query you have right now for an individual “series”?

In InfluxDB, a “series” is defined as a unique combination of measurement name and tag values, and Flux is built specifically to make working with many series at once easy, so I’m confident there’s a good way to accomplish what you want, I just need to better understand what you want

ticchioned · April 21, 2022, 3:16pm

Sure. I’d love to give a little more information about my schema as well since I feel that will help you better understand what I’m trying to accomplish. My schema is setup as the following. 12,000 measurements and each measurement has 6 fields. Each series only contains a measurement, no tags in this particular case. I would like to be able to get the last “Value” for any amount of series in this database in one query. Currently I am querying the series individually like this:

from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "myMeasurement" and r._field == "Value")
|> last()

I would like to be able to do something like this:

from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "myMeasurement" 
     or r._measurement == "myMeasurement1" 
     or r._measurement == "myMeasurement2") // ... 12,000 measurements here
|> filter(fn: (r) => r._field == "Value")
|> last()

ticchioned · April 21, 2022, 3:33pm

@mhall119 Also, I have been using this query occasionally as well. The reason I do not want to utilize this query is because I would like to be able to only get the data I need. I wont always need all 12,000 series, sometimes I’ll only need 1 to be returned which is why I would like to filter out the ones I do not need.

from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._field == "Value")
|> last()

mhall119 · April 21, 2022, 3:46pm

So if I understand correctly, you have 12,000 different values for _measurement, but each one has a single field Value?

If that’s the case, I’d recommend changing your schema to have one _measurement name, and then 12,000 different _field names.

ticchioned · April 21, 2022, 4:04pm

@mhall119 If I made that change what would the query I’m trying to achieve look like? Changing the schema doesn’t get around the issue of filtering through 12,000 items, does it? Unless influx is able to filter through 12,000 fields and not measurements?

from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "Value")
|> filter(fn: (r) => r._field == "field1"
     or r._field == "field2"
     or r._field == "field3") // ... up to 12,000 fields here
|> last()

mhall119 · April 21, 2022, 6:44pm

If you don’t filter on _field at all, you get all of them. You can also use a regular expression if you just want fields matching a specific pattern:

|> filter(fn: (r) => r._field =~ /field[1-9]/)

Or, if you want to be even more dynamic about it, you can use the contains function to pass in a variable list of field names to filter on:

|> filter(fn: (r) => contains(value: r._field, set: ["field1","field2","field3"]))

mhall119 · April 21, 2022, 6:45pm

Because of the way InfluxDB stores data, it’s more efficient filtering through a lot of fields on a single measurement than a lot of measurements with a single field.

Think of a measurement more like a table in a traditional relational database. Give this blog post a read, it’s about pivoting data, but the start of it explains InfluxDB’s data structure, which will help make sense of why this approach is going to be more performant for you.

mhall119 · April 21, 2022, 6:48pm

Also this blog post which shows how to convert from our old SQL-like query language into Flux, it really demonstrates how _measurement is treated like a table.

mhall119 · April 21, 2022, 7:20pm

One last resource, there’s a whole chapter on schema design in our online book

Topic		Replies	Views
Large Filter Clause Performance InfluxDB 2 influxdb , schema , query , flux , performance	6	700	June 20, 2023
Schema -- Every combination of tag keys and values InfluxDB 2 schema , influxql , query , flux	0	820	January 20, 2022
Flux Query performance: Multiple parallel queries over one big query? influxdb , query , flux , performance	2	861	June 8, 2024
Querying data that has more than 1 field InfluxDB 2 query , flux	4	258	February 19, 2024
Running a query with where clause which has one tag and one field InfluxDB 2 influxdb	2	244	May 20, 2023

Query 12,000 series at once

Related topics