Telegraf continiously rising CPU usage

I am evaluating telegraf as a collector for our monitoring at the moment. And it works great (with influxdb), but the rising CPU usage is worrying me. It rises 1% every two days and there is no end in sight.

2017-03-22 0.6%
2017-03-24 1.0%
2017-03-26 2.0%
2017-03-28 3.0%
2017-03-30 4.0%
2017-03-31 0.3% restart (configuration change of flush interval)
2017-03-01 1.0%
2017-03-02 2.0%

RAM usage looks okay.

(Grid has 1% steps)

Config:

[agent]
  ## Default data collection interval for all inputs
  interval = "20s"
  round_interval = true

  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "60s"
  flush_jitter = "0s"
  precision = ""

  ## Logging configuration:
  debug = false
  quiet = false
  logfile = ""
  hostname = ""
  omit_hostname = false

[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false

[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs"]

[[inputs.diskio]]

# Get kernel statistics from /proc/stat
[[inputs.kernel]]

# Read metrics about memory usage
[[inputs.mem]]

# Get the number of processes and group them by status
[[inputs.processes]]

# Read metrics about swap memory usage
[[inputs.swap]]

# Read metrics about system load & uptime
[[inputs.system]]

# # Read TCP metrics such as established, time wait and sockets counts.
[[inputs.netstat]]

# # Monitor process cpu and memory usage
[[inputs.procstat]]
exe = "vnstatd"

[[inputs.procstat]]
exe = "influxd"

[[inputs.procstat]]
exe = "grafana-server"

[[inputs.procstat]]
exe = "telegraf"

Is this normal? I don’t get why it uses more and more CPU when it does the same task every few seconds. RAM usage is always under 33MB.

We recently fixed a bug with the procstat input that could cause this, do you think you could try the latest development version and report back with the results? You can build from source or we have nightly builds, here is a link to the amd64 deb package, let me know if you need a different platform.

Have problems installing:

dpkg -i telegraf_nightly_amd64.deb 
dpkg: error processing archive telegraf_nightly_amd64.deb (--install):
 parsing file '/var/lib/dpkg/tmp.ci/control' near line 2 package 'telegraf':
 error in 'Version' field string 'dev~n201704030819-0': version number does not start with digit
Errors were encountered while processing:
 telegraf_nightly_amd64.deb

I think I have fixed the package, can you retry?

1 Like

The current nightly has a valid version string. Deployed it to the test system and will update when i have conclusive data on the problem.

Thanks for your help!

It appears very stable at around 0.18 to 0.20%.

Edit: Can i set the topic to solved/answered? Does this forum support any such discourse extension?

@lightonflux not currently but if you have one you have seen before I would love to check it out.

Level1Techs uses this one: discourse-solved. Looks like this in production.

It does not mark the topic as solved in the thread view but adds an indicator in the answer and a box in the first post that tells the reader which post is the solution to the asked problem. Not exactly what i have in mind, but close enough. There is also Solved-Button no idea how that looks or works.

There is also a discussion at discourse hq about how do approach this usecase.

(Double post because of link limit)

Edit: Correction, there is an indication (check mark) in the thread view:

@lightonflux Thank you for that! I’ll look into it.

Do you have a release schedule? Would like to know then the next stable is out.

We are planning to do the 1.3.0 release when the items in the 1.3.0 milestone are completed.