We have recently started using InfluxDB Cloud via Azure, to store data pertaining to ships. Each ship has its own bucket, and within each bucket we have two measurements, with around 25-28 fields for one of those measurements and a handful for the other measurement. We are not using any tags, and we store a single value per field per second.
We have been loading live data into InfluxDB for around the past month, and have been very happy with the query performance which has been more or less instantaneous for the kinds of queries we’ve been running. However, we recently imported a load of historical data for four ships, and trying to query this historical data results in very uneven performance, and we’re not sure why.
Here is a sample query we would typically run:
from(bucket: "bucketname")
|> range(start: 1677196800, stop: 1679702400)
|> filter(fn: (r) => r["_measurement"] == "measurement1")
|> window(every: 1d)
|> mean()
We don’t filter the results by field, as we typically want values returned for all fields. All that changes is the date range we run the query for. In the example above, that’s from 24 Feb 2023 to 25 Mar 2023: this period includes the live data that we initially started recording, and the results are returned instantaneously. However, if we change the date range to 1667260800 - 1669939200 (1 Nov 2022 to 2 Dec 2022) then the query takes much, much longer: well over a minute for that particular example, after which I just cancelled the query.
The performance is sometimes better if we filter by field, like so:
|> filter(fn: (r) => r["_measurement"] == "measurement1" and r["_field"] == "field1")
but even then it is not consistent: some fields return instantly, whereas others take ages. There doesn’t seem to be any particular pattern to which queries will run quickly, and which will be time-consuming. Again, all that’s changing is the date range we use: the total amount of time covered will be roughly the same (say a month’s-worth), as will the aggregation window.
Can anyone provide any insight into why the queries run like this, and what we can do to resolve the issue?