I have an intermittent problem where queries that would normally return in less than 1s suddenly take 30+ s to return.
The odd thing is that if I run the exact queries a few minutes later, they will return in less than 1s again.
I have a theory as to what I think is happening, but I am looking for some expertise on it.
What I have observed:
- the slowness always seems to happen within an hour of the end_time of the previous shard.
e.g. if the shard_end time is 2018-12-26T00:00:00Z, then any slow events have only occurred in the gap between 2018-12-26T:00:00:00Z and 2018-12-26T01:00:00Z
- the shard time range is 24 hours
- the queries are all of the form that is looking for the most recent value in a time range
- there are insufficient data points in the new shard to complete the query, so it has to reload the previous one to service it, thereby causing a lag on that query execution.
Could this be possible? and if so, would reducing the time range of the shards to, say, 1 hour, allow it to reload data a bit more incrementally as necessary and avoid having a huge lag of reloading an entire previous (24hr) shards data?