We notice that there are many write timeout and query failed with influxdb:
- Our client use HTTP API, /write and /query
- Client calling
/write
API exceed timeout, 10 seconds -
/query
response 400queue length exceed
- disk util reached 100%, only read operation, no write
- these three issue would appear at same time
Then I found that influxd was doing retention while these time, causing disk read util 100%.
There are almost 700G LSM file, on SSD.
#influxd version
InfluxDB 2.0.0-beta.14 (git: c8af0f35be) build_date: 2020-07-08T20:42:23Z
#uname -a
Linux -------- 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux