Influx queries intermittently slow




I have an intermittent problem where queries that would normally return in less than 1s suddenly take 30+ s to return.

The odd thing is that if I run the exact queries a few minutes later, they will return in less than 1s again.

I have a theory as to what I think is happening, but I am looking for some expertise on it.

What I have observed:

  • the slowness always seems to happen within an hour of the end_time of the previous shard.
    e.g. if the shard_end time is 2018-12-26T00:00:00Z, then any slow events have only occurred in the gap between 2018-12-26T:00:00:00Z and 2018-12-26T01:00:00Z
  • the shard time range is 24 hours
  • the queries are all of the form that is looking for the most recent value in a time range

My Theory:

  • there are insufficient data points in the new shard to complete the query, so it has to reload the previous one to service it, thereby causing a lag on that query execution.

Could this be possible? and if so, would reducing the time range of the shards to, say, 1 hour, allow it to reload data a bit more incrementally as necessary and avoid having a huge lag of reloading an entire previous (24hr) shards data?


for anyone else that might have this problem in the futureā€¦

it was indeed due to shards, but not in the way i suspected.

The slowness always occurs when influx expires a shard - it would take 10-15 minutes to perform the delete of the expired shard.
Not sure why, looks kind of like its re-making indexes or something.

I did not solve the slowness, but I was able to change the influx config such that the shard deletion occurs outside of production hours by changing the timing of the retention policy checks.