Telegraf Slow to restart

Hi All,

I have a telegraf instance (v 1.9.1) running some odd 550 inputs (snmp), if i make a change to these inputs and send a ‘reload’ to telegraf via systemctl, it takes almost 10 mins before data collection resumes.

is there anything i can do to improve this poor reload time?

this is running on a rhel 7 host, 4 cores 8gb ram.

Thanks

I don’t know how to solve that issue.

Another option would be to use anoher snmp collector such as toni-moreno/snmpcollector. It serves a configuration webpage. You don’t have to restart telegraf every time you change a setting. I’ve only tested it with a few inputs. I don’t know what’s the performance with 550 tags.

interesting project, i do like that it does some calculations prior to sending into influx (something i was interested in but couldn’t find any evidence that telegraf supported it). I did want to stay within the TICK family though, especially after the effort to generate the 550 config files.

I don’t think you will be able to improve this with Telegraf currently, though we are hoping to handle reloads better in the future. I’ve heard good things as well about @toni-moreno’s snmpcollector.

It would be interested to hear about the calculations you are wanting to do and also, if possible, could I get a copy of your configuration files for benchmarking?

@CurlingForFun I created on the Telegraf issue tracker with an idea to address this problem, keep an eye out over there for updates.

Hi. @samaust.

We are working with > 600 diferent devices ( routers, switches, load balancers, firewall, etc) on production, collecting > 500k points/1min.

It’s runtime web view let us know what exactly devices are colecting in real time.

It has also a complete self monitoring metrics per device/snmpcollector agent. ( errors, gathering time and other important metrics to know how good is our collection system ).

I would be happy to show you all our features if you want.