Large variance in Flux query duration across different date ranges of similar length

We have recently started using InfluxDB Cloud via Azure, to store data pertaining to ships. Each ship has its own bucket, and within each bucket we have two measurements, with around 25-28 fields for one of those measurements and a handful for the other measurement. We are not using any tags, and we store a single value per field per second.

We have been loading live data into InfluxDB for around the past month, and have been very happy with the query performance which has been more or less instantaneous for the kinds of queries we’ve been running. However, we recently imported a load of historical data for four ships, and trying to query this historical data results in very uneven performance, and we’re not sure why.

Here is a sample query we would typically run:

from(bucket: "bucketname")
  |> range(start: 1677196800, stop: 1679702400)
  |> filter(fn: (r) => r["_measurement"] == "measurement1")
  |> window(every: 1d)
  |> mean()

We don’t filter the results by field, as we typically want values returned for all fields. All that changes is the date range we run the query for. In the example above, that’s from 24 Feb 2023 to 25 Mar 2023: this period includes the live data that we initially started recording, and the results are returned instantaneously. However, if we change the date range to 1667260800 - 1669939200 (1 Nov 2022 to 2 Dec 2022) then the query takes much, much longer: well over a minute for that particular example, after which I just cancelled the query.

The performance is sometimes better if we filter by field, like so:

  |> filter(fn: (r) => r["_measurement"] == "measurement1" and r["_field"] == "field1")

but even then it is not consistent: some fields return instantly, whereas others take ages. There doesn’t seem to be any particular pattern to which queries will run quickly, and which will be time-consuming. Again, all that’s changing is the date range we use: the total amount of time covered will be roughly the same (say a month’s-worth), as will the aggregation window.

Can anyone provide any insight into why the queries run like this, and what we can do to resolve the issue?

Hello @StevenRands,
It’s possible some of the older data is stored in cold shards vs hot shards and that could be accounting for the delay.
Luckily this problem is going to be addressed with the new version of the storage engine in InfluxDB Cloud (InfluxDB Cloud powered by IOx let me know if you want more info on that or if you plan to stick to OSS).
However, you might be interested in the Flux profiler.

Here’s a post on it as well:

Thanks for the links @Anaisdg.

The query situation has improved a fair bit since my original post, without us changing anything with regards to the types of queries we are running. Whether this is due to the cold/hot shards and stuff being moved between one and the other I couldn’t say.

I also wanted to clarify that we are accessing InfluxDB via Azure (https://westeurope-1.azure.cloud2.influxdata.com specifically) and on the web interface it says “InfluxDB Cloud powered by TSM”: how does TSM relate to IOx and OSS?