Debugging High CPU Usage when IDLE

paulsc · March 21, 2023, 4:57pm

Hi everyone!

I’m debugging a high CPU Usage (50%) of InfluxDB 1.6.4 OSS when idle, and I would love a few pointers!

Our instance has been gradually increasing CPU usage from 15% about a year ago to 50% now. There is very little traffic on the instance (less than 5 req/s on average).

The instance is an EC2 r4.large, 2 vcpus, 15G ram. CPUs are “Up to 2.3 GHz Intel Xeon Scalable Processor”.

Watching the logs I can see it’s mostly “Cache snapshot” and “Compacting file” messages. Here’s an excerpt: influxdb.log · GitHub

From reading about TSM, cache, and WAL in the influxdb docs, I am guessing that it is spending a lot of time flushing cache to disk. I’m not sure why that is when there are so little read/writes, I would assume the cache would not grow.

Any pointers or ideas are greatly appreciated!

Best,

Paul

Anaisdg · March 21, 2023, 6:02pm

Hello @paulsc,
I’m really not sure.
Perhaps you want to upgrade to TSI indexes?

Also maybe the debugging discussed here could be helpful

github.com/influxdata/influxdb

High cpu usage when checking series cardinality

opened 07:03AM - 14 Aug 17 UTC

closed 10:02PM - 15 Aug 17 UTC

dazoot

area/performance

We have been using influx for over a year. The number of series we have in DB is… about 110 mil. We have one database: graphite, with a 1 year retention policy. The number of queries is not that high. We see consistent CPU usage even if there is no read query being ran on the server. We have upgraded to latest influxdb version and TSI indexes. Server is quite big: Dual Hexa Core 256 GB of Ram. ![cpu-pinpoint 1468133712 1502693712](https://user-images.githubusercontent.com/1543455/29261379-d54f3906-80d7-11e7-8606-0400b3afa003.png)

paulsc · March 22, 2023, 9:13am

Hello @Anaisdg,

Thank you for your answer.

I was just taking a look at TSI indices, it seems that it allows to reduce the dependency on RAM by using the disk. In our case we have quite low RAM utilization at about 15% on average. So unless I’m missing something, this is probably not the right approach.

Thank you for submitting the GitHub issue. It looks like that problem was fixed in 2017, a year before the release of our version (1.6.4) so I’m assuming that we are running the fix already.

I just tried changing the “cache-snapshot-memory-size” from the default of 25M to 100M, and this does seem to help with CPU load. The “cache snapshot written” messages in the logs went from every 2-3s to every 30s, and our CPU load is down from 50% to 10%.

I would love to understand the behavior of the cache system a bit more, and specifically why there is activity writing the cache to disk where there are no read or writes on the DB. Currently we have absolutely no traffic and I can see this happening.

Is there someone I can reach out to regarding this or is there a resource somewhere describing this ?

Your help is very appreciated,

Best

Paul

Topic		Replies	Views
High CPU usage while idle every few hours for a few hours	1	1493	January 28, 2019
Influxdb high CPU usage	17	21910	October 31, 2017
Influxdb 1.7.1 High CPU Usage influxdb	6	3539	March 25, 2020
Influxdb CPU usage Store influxdb	8	8612	April 28, 2017
High CPU Usage on InfluxDB	1	647	September 20, 2022

Debugging High CPU Usage when IDLE

Related topics