Telegraf Slow to restart

CurlingForFun · February 13, 2019, 4:33am

Hi All,

I have a telegraf instance (v 1.9.1) running some odd 550 inputs (snmp), if i make a change to these inputs and send a ‘reload’ to telegraf via systemctl, it takes almost 10 mins before data collection resumes.

is there anything i can do to improve this poor reload time?

this is running on a rhel 7 host, 4 cores 8gb ram.

Thanks

samaust · February 13, 2019, 4:40am

I don’t know how to solve that issue.

Another option would be to use anoher snmp collector such as toni-moreno/snmpcollector. It serves a configuration webpage. You don’t have to restart telegraf every time you change a setting. I’ve only tested it with a few inputs. I don’t know what’s the performance with 550 tags.

CurlingForFun · February 13, 2019, 6:32am

interesting project, i do like that it does some calculations prior to sending into influx (something i was interested in but couldn’t find any evidence that telegraf supported it). I did want to stay within the TICK family though, especially after the effort to generate the 550 config files.

daniel · February 13, 2019, 8:41pm

I don’t think you will be able to improve this with Telegraf currently, though we are hoping to handle reloads better in the future. I’ve heard good things as well about @toni-moreno’s snmpcollector.

It would be interested to hear about the calculations you are wanting to do and also, if possible, could I get a copy of your configuration files for benchmarking?

daniel · April 16, 2019, 12:05am

@CurlingForFun I created on the Telegraf issue tracker with an idea to address this problem, keep an eye out over there for updates.

github.com/influxdata/telegraf

Improve speed of snmp plugin loading

opened 06:43PM - 12 Apr 19 UTC

closed 10:47PM - 30 Nov 21 UTC

danielnelson

feature request area/snmp

## Feature Request ### Proposal: ### Current behavior: For each plugin,… snmptranslate is called on each field, information is not shared between plugin instances. ### Desired behavior: Add a shared cache for loading SNMP MIB data to improve the startup speed when using large tables, many fields, or many plugins. Use more efficient lookup strategy, perhaps `snmptranslate -Tso -m <mib-file>` to avoid loading the cache file multiple times. When Telegraf is reloaded, the cache should be cleared and rebuilt. ### Use case: Currently Telegraf can be very slow loading SNMP inputs, in some cases taking up to 10 minutes. This is made worse when you have large MIB files.

toni-moreno · June 22, 2019, 9:11am

Hi. @samaust.

We are working with > 600 diferent devices ( routers, switches, load balancers, firewall, etc) on production, collecting > 500k points/1min.

It’s runtime web view let us know what exactly devices are colecting in real time.

It has also a complete self monitoring metrics per device/snmpcollector agent. ( errors, gathering time and other important metrics to know how good is our collection system ).

I would be happy to show you all our features if you want.

Topic		Replies	Views
Telegraf-Interval issue Telegraf influxdb	31	1366	February 27, 2024
Telegraf, restart, reload, other, what's the difference? Telegraf smnp	2	3463	July 2, 2018
Strange behavior in Telegraf SNMP plugin Telegraf telegraf , grafana	1	1013	November 1, 2017
SNMP seems not working [SOLVED] Telegraf telegraf	1	2429	August 14, 2018
Telegraf - scale out SNMP collectors? Telegraf telegraf	0	737	November 14, 2018

Telegraf Slow to restart

Related topics