Influx queries intermittently slow

Hi

I have an intermittent problem where queries that would normally return in less than 1s suddenly take 30+ s to return.

The odd thing is that if I run the exact queries a few minutes later, they will return in less than 1s again.

I have a theory as to what I think is happening, but I am looking for some expertise on it.

What I have observed:

  • the slowness always seems to happen within an hour of the end_time of the previous shard.
    e.g. if the shard_end time is 2018-12-26T00:00:00Z, then any slow events have only occurred in the gap between 2018-12-26T:00:00:00Z and 2018-12-26T01:00:00Z
  • the shard time range is 24 hours
  • the queries are all of the form that is looking for the most recent value in a time range

My Theory:

  • there are insufficient data points in the new shard to complete the query, so it has to reload the previous one to service it, thereby causing a lag on that query execution.

Could this be possible? and if so, would reducing the time range of the shards to, say, 1 hour, allow it to reload data a bit more incrementally as necessary and avoid having a huge lag of reloading an entire previous (24hr) shards data?

for anyone else that might have this problem in the future…

it was indeed due to shards, but not in the way i suspected.

The slowness always occurs when influx expires a shard - it would take 10-15 minutes to perform the delete of the expired shard.
Not sure why, looks kind of like its re-making indexes or something.

I did not solve the slowness, but I was able to change the influx config such that the shard deletion occurs outside of production hours by changing the timing of the retention policy checks.