I have a telegraf instance (v 1.9.1) running some odd 550 inputs (snmp), if i make a change to these inputs and send a ‘reload’ to telegraf via systemctl, it takes almost 10 mins before data collection resumes.
is there anything i can do to improve this poor reload time?
this is running on a rhel 7 host, 4 cores 8gb ram.
Another option would be to use anoher snmp collector such as toni-moreno/snmpcollector. It serves a configuration webpage. You don’t have to restart telegraf every time you change a setting. I’ve only tested it with a few inputs. I don’t know what’s the performance with 550 tags.
interesting project, i do like that it does some calculations prior to sending into influx (something i was interested in but couldn’t find any evidence that telegraf supported it). I did want to stay within the TICK family though, especially after the effort to generate the 550 config files.
I don’t think you will be able to improve this with Telegraf currently, though we are hoping to handle reloads better in the future. I’ve heard good things as well about @toni-moreno’s snmpcollector.
It would be interested to hear about the calculations you are wanting to do and also, if possible, could I get a copy of your configuration files for benchmarking?
We are working with > 600 diferent devices ( routers, switches, load balancers, firewall, etc) on production, collecting > 500k points/1min.
It’s runtime web view let us know what exactly devices are colecting in real time.
It has also a complete self monitoring metrics per device/snmpcollector agent. ( errors, gathering time and other important metrics to know how good is our collection system ).
I would be happy to show you all our features if you want.