Telegraf continiously rising CPU usage

lightonflux · April 2, 2017, 5:18pm

I am evaluating telegraf as a collector for our monitoring at the moment. And it works great (with influxdb), but the rising CPU usage is worrying me. It rises 1% every two days and there is no end in sight.

2017-03-22 0.6%
2017-03-24 1.0%
2017-03-26 2.0%
2017-03-28 3.0%
2017-03-30 4.0%
2017-03-31 0.3% restart (configuration change of flush interval)
2017-03-01 1.0%
2017-03-02 2.0%

RAM usage looks okay.

(Grid has 1% steps)

Config:

[agent]
  ## Default data collection interval for all inputs
  interval = "20s"
  round_interval = true

  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "60s"
  flush_jitter = "0s"
  precision = ""

  ## Logging configuration:
  debug = false
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = false

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs"]

[[inputs.diskio]]

# Get kernel statistics from /proc/stat
[[inputs.kernel]]

# Read metrics about memory usage
[[inputs.mem]]

# Get the number of processes and group them by status
[[inputs.processes]]

# Read metrics about swap memory usage
[[inputs.swap]]

# Read metrics about system load & uptime
[[inputs.system]]

# # Read TCP metrics such as established, time wait and sockets counts.
[[inputs.netstat]]

# # Monitor process cpu and memory usage
[[inputs.procstat]]
exe = "vnstatd"

[[inputs.procstat]]
exe = "influxd"

[[inputs.procstat]]
exe = "grafana-server"

[[inputs.procstat]]
exe = "telegraf"

Is this normal? I don’t get why it uses more and more CPU when it does the same task every few seconds. RAM usage is always under 33MB.

daniel · April 3, 2017, 5:53pm

We recently fixed a bug with the procstat input that could cause this, do you think you could try the latest development version and report back with the results? You can build from source or we have nightly builds, here is a link to the amd64 deb package, let me know if you need a different platform.

lightonflux · April 4, 2017, 3:16pm

Have problems installing:

dpkg -i telegraf_nightly_amd64.deb 
dpkg: error processing archive telegraf_nightly_amd64.deb (--install):
 parsing file '/var/lib/dpkg/tmp.ci/control' near line 2 package 'telegraf':
 error in 'Version' field string 'dev~n201704030819-0': version number does not start with digit
Errors were encountered while processing:
 telegraf_nightly_amd64.deb

daniel · April 4, 2017, 9:32pm

I think I have fixed the package, can you retry?

lightonflux · April 4, 2017, 10:23pm

The current nightly has a valid version string. Deployed it to the test system and will update when i have conclusive data on the problem.

Thanks for your help!

lightonflux · April 5, 2017, 4:54pm

It appears very stable at around 0.18 to 0.20%.

Edit: Can i set the topic to solved/answered? Does this forum support any such discourse extension?

jackzampolin · April 5, 2017, 5:26pm

@lightonflux not currently but if you have one you have seen before I would love to check it out.

lightonflux · April 5, 2017, 8:26pm

Level1Techs uses this one: discourse-solved. Looks like this in production.

lightonflux · April 5, 2017, 8:27pm

It does not mark the topic as solved in the thread view but adds an indicator in the answer and a box in the first post that tells the reader which post is the solution to the asked problem. Not exactly what i have in mind, but close enough. There is also Solved-Button no idea how that looks or works.

There is also a discussion at discourse hq about how do approach this usecase.

(Double post because of link limit)

Edit: Correction, there is an indication (check mark) in the thread view:

jackzampolin · April 5, 2017, 8:38pm

@lightonflux Thank you for that! I’ll look into it.

lightonflux · April 6, 2017, 5:17am

Do you have a release schedule? Would like to know then the next stable is out.

daniel · April 6, 2017, 5:52pm

We are planning to do the 1.3.0 release when the items in the 1.3.0 milestone are completed.

Topic		Replies	Views
Bug report. Out of memory, when create "Checks" in "Monitoring & Alerting" InfluxDB 2 kapacitor , feedback	14	1521	March 11, 2020
Collecting metrics for 0 objects Telegraf influxdb , telegraf	4	186	April 30, 2024
Error with custom telegraf.conf InfluxDB 2	1	1038	February 22, 2021
Openbsd, telegraf and influxdb v2 : cpu reporting issue Telegraf	0	419	July 9, 2021
Proxmox 7 and InfluxDB 2 Telegraf	0	2922	August 19, 2021

Telegraf continiously rising CPU usage

Related topics