Question about query over multiple measurements


I’m wondering how to optimally issue queries for a groups of measurements. My situation is like this:

  • there are many (up to hundreds of thousands) individual measurements, perhaps storing lots of data
  • occasionally I want to group some (‘some’ might mean 10k) of them
  • and do some queries (sum of data, average, etc.)
  • one measurement might be in 0, 1 or more groups at any given time
  • when group gets defined, I would like to see the aggregated data from before group creation time (it would be ideal to see the aggregated history immediately, but it is OK to wait a bit before history gets aggregated for given new group)

Groups can be added and removed dynamically over lifespan of the measurement, so tagging isn’t really viable here, I’m afraid. Unless I can add/remove tags on the fly?

There are few obvious solutions, like querying them all one by one and then aggregating myself. Or having giant where id = 1 or id = 2 or id = 3 or .. kind of monster. I don’t think that they will be performant, however.

How things like that should be modelled? So far I’m using Influx only, maybe Kapacitor will help me here?

I would probably opt for Kapacitor in this scenario. Like you said, the queries get a bit long-winded when you’re doing these kinds of actions.

I can point you to a couple guides:

This one discusses when to choose Kapacitor vs InfluxQL Continuous Queries
This is the documentation for turning your query into a Kapacitor TICKscript.

Thanks, that’s pretty much of what I’ve suspected. Any tips how to improve performance on doing maths (e.g. average) on thousands of measurements?