Occasional High Disk Read Throughput Making Influxdb Unusable

We are running influxdb 1.8.2 on Centos 7.

The system runs completely stable for a days, no issues with read or write throughput. After a few days influxdb will start reading using all available cores on the system (64 in this case), each core will use a portion of the total disk throughput completely saturating the disk and making the system unusable. This is a D64s Azure VM and when this issue happens in uses all 750MB/sec of available disk throughput, this will run for hours, usually a restart of the influxdb services will resolve the problem.

We’ve resized the VM and disk a few times thinking it was purely an issue of being undersized, but each time this issue happens and each time will use all the available disk resources.

We’ve currently constrained the database to only 14 days of data, with have just under 8 million series in the database (785GB used for data). We would like to have at least a month, but want to get handle on the root cause here first.

The only thing we noticed today was the increased heap usage for influxdb prior to the read IO boost (issue starfed at 10:30, influxdb service restarted at 11:10):

Capture

Hello @dmarshal,
I’m sorry you’re having problems. I’m not sure I know how to answer this question. I’ll ask for help and get back to you. Thanks.