Large query performance

Ok here is a problem we have. We are streaming data into influxdb. The data is sampled at 1s interval.
From time to time we need to query data over large time range. 5 years is a good example. We do not need 1s precision, but would like this query to be fast.

Example:
SELECT mean(value) as value FROM “values” WHERE ( id=$F_0 ) AND time >= 2016-07-22T00:00:00-07:00 AND time < 2021-07-21T00:00:00-07:00 GROUP BY time(86400s),id fill(none);
Query like this, will take > 20s

We tried down sampling data using continuous queries, but its a major challenge to keep down sampled data synchronized with actual data. Our ingestion is not constant, we could have outages and catch up, that results in gaps in down sampled data.

Is there a performant way to fill gaps in data produced by continuous queries?
is there a performant way to query data over large periods of time at 24h interval for example?

About filling gaps, it depends…
You can always fill in manually by running the query yourself for the “missing range”, but if you know how long it takes to recover those data, you can just reprocess a whole period every time it runs
as and example you can:

  • AggrInterval - downsample data to 24h intervals
  • Frequency - every 24h
  • TimeWindow - reprocess 3 whole days (72h) of data, overriding previous results.

see CQ advanced synthax

About performance, this case is the perfect example of downsampling usage, with so much data I doubt there is much you can do to improve performance. (I assume you are already getting as few data as possible with the query). the only other option I see is increasing resources, or maybe trying t get the result with more than one query (one per year?) that should help with time but will require more resources.

1 Like