I have a database with around 12,000 series in it. Each series has a Value field where that specific series value is stored. I would like to be able to query any amount of those 12,000 series values at once. I know I can limit my flux filter to just a field filter, which would return every series’ value field however I need to be able to do something like that dynamically for any amount of series, not necessarily all 12,000. Also, adding 12,000 measurements (there are no tag key/values in this schema) to a filter is not a great option either… I’m sure a query like that would not be performant. From the research I’ve done it seems like this is not really a feature that influx is “built” for as of now and cross-series queries seem to be difficult. As of now I am querying each series independently however when you have to query 12,000 series individually it will inevitably take a long time. Are there any ways around an issue like this? Do I have to rethink my schema to be able to pull this off? Or is there a way to filter a large list of measurements in one query without losing performance?
Hi @ticchioned,
I’m a little confused by what you’re explaining, can you show me an example of the query you have right now for an individual “series”?
In InfluxDB, a “series” is defined as a unique combination of measurement name and tag values, and Flux is built specifically to make working with many series at once easy, so I’m confident there’s a good way to accomplish what you want, I just need to better understand what you want
Sure. I’d love to give a little more information about my schema as well since I feel that will help you better understand what I’m trying to accomplish. My schema is setup as the following. 12,000 measurements and each measurement has 6 fields. Each series only contains a measurement, no tags in this particular case. I would like to be able to get the last “Value” for any amount of series in this database in one query. Currently I am querying the series individually like this:
from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "myMeasurement" and r._field == "Value")
|> last()
I would like to be able to do something like this:
from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "myMeasurement"
or r._measurement == "myMeasurement1"
or r._measurement == "myMeasurement2") // ... 12,000 measurements here
|> filter(fn: (r) => r._field == "Value")
|> last()
@mhall119 Also, I have been using this query occasionally as well. The reason I do not want to utilize this query is because I would like to be able to only get the data I need. I wont always need all 12,000 series, sometimes I’ll only need 1 to be returned which is why I would like to filter out the ones I do not need.
from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._field == "Value")
|> last()
So if I understand correctly, you have 12,000 different values for _measurement
, but each one has a single field Value
?
If that’s the case, I’d recommend changing your schema to have one _measurement
name, and then 12,000 different _field
names.
@mhall119 If I made that change what would the query I’m trying to achieve look like? Changing the schema doesn’t get around the issue of filtering through 12,000 items, does it? Unless influx is able to filter through 12,000 fields and not measurements?
from(bucket: "myBucket")
|> range(start: 0)
|> filter(fn: (r) => r._measurement == "Value")
|> filter(fn: (r) => r._field == "field1"
or r._field == "field2"
or r._field == "field3") // ... up to 12,000 fields here
|> last()
If you don’t filter on _field
at all, you get all of them. You can also use a regular expression if you just want fields matching a specific pattern:
|> filter(fn: (r) => r._field =~ /field[1-9]/)
Or, if you want to be even more dynamic about it, you can use the contains function to pass in a variable list of field names to filter on:
|> filter(fn: (r) => contains(value: r._field, set: ["field1","field2","field3"]))
Because of the way InfluxDB stores data, it’s more efficient filtering through a lot of fields on a single measurement than a lot of measurements with a single field.
Think of a measurement more like a table in a traditional relational database. Give this blog post a read, it’s about pivoting data, but the start of it explains InfluxDB’s data structure, which will help make sense of why this approach is going to be more performant for you.
Also this blog post which shows how to convert from our old SQL-like query language into Flux, it really demonstrates how _measurement
is treated like a table.
One last resource, there’s a whole chapter on schema design in our online book