I need help on the CPU usage trend of influxdb.
In the below picture, the CPU usage rises to 68% after every 2 minutes. Telegraf runs at an interval of 10 seconds and sends data every 10 seconds to influxdb.
But why after 2 minutes I don’t get it.
Also, does influxdb uses RAM memory to cache the recently written and read data for fast availability ? If yes, how can I limit the value of that cache ?
First off, regarding CPU usage you should take a look at some of the _internal metrics (which you should enable for collection) to see if garbage collection is occurring or something else is happening at that time.
There are situations where compactions will consume CPU usage when dropping measurements from one level of compression to another. That being said, the spikes you’re seeing are quite drastic and I’d look to see what else is happening on the box during those intervals.
As for the in-mem store, the concept of the “wal” vs “disk” is something to note with influxdb. If properly configured (you might want to paste your config here, masking any sensitive info if necessary), all writes will go into the in-mem index (stored in the write-ahead log ‘wal’) and replicated to disk asynchronously. Your writes are therefore ack’d after completing the wal write and async added to .tsm files on disk.
The compactions from one level of tsm to another will also happen asynchronously and concurrently with on-going reads/writes. The inluxDB team has made major improvements to fork this work off from the incoming metrics to ensure locking is not a problem with heavy write loads.
Can I limit the value of main memory given to this in-mem store ? How much generally is used by this in-mem store ? and where is this WAL stored, in main memory ?
“This in-mem has the copy of all the data in the WAL and index of the data in TSM, and performs the read queries. The write queries are written to WAL and then sent to the TSM.” Am I correct here ?
I have asked too many questions in this reply, sorry for that
Those are the influxdb collector measurements coming from telegraf, they’re close but not the internal ones I was referring to.
Instead, this is a config on the influxdb side itself. For example, my config is in /etc/influxdb/influxdb.conf:
###
### Controls the system self-monitoring, statistics and diagnostics.
###
### The internal database for monitoring data is created automatically if
### if it does not already exist. The target retention within this database
### is called 'monitor' and is also created with a retention period of 7 days
### and a replication factor of 1, if it does not exist. In all cases the
### this retention policy is configured as the default for the database.
[monitor]
store-enabled = true # Whether to record statistics internally.
store-database = "_internal" # The destination database for recorded statistics
store-interval = "10s" # The interval at which to record statistics
@Luv We mmap recent data and let the OS page it out as needed so the usage can look higher than it actually is. But ~2GB of RAM is not an unusual amout of memory for the process to use.
You want to look more at RSS than Virtual mem, the latter can look inflated to @jackzampolin’s point.
Overall though, 600MB is a pretty small amount of mem to consume unless you’re not really sending a lot of data. For example, our machine is processing about 500k writes/sec and hosting ~46GB RSS.