Influxdb 2.0 retention causing disk util 100%, how to optimize?

We notice that there are many write timeout and query failed with influxdb:

  • Our client use HTTP API, /write and /query
  • Client calling /write API exceed timeout, 10 seconds
  • /query response 400 queue length exceed
  • disk util reached 100%, only read operation, no write
  • these three issue would appear at same time

Then I found that influxd was doing retention while these time, causing disk read util 100%.

There are almost 700G LSM file, on SSD.

#influxd version
InfluxDB 2.0.0-beta.14 (git: c8af0f35be) build_date: 2020-07-08T20:42:23Z
#uname -a
Linux -------- 3.10.0-327.el7.x86_64 #1 SMP Thu Nov 19 22:10:57 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux