We are trying to monitor a bunch of network devices (> 1000) using telegraf and snmp input plugin. Because we are facing the requirement to have a unique snmp community per device, we are forced to use a per device config for telegraf.
Our Tests run fine if we are putting aroung 20 / 30 .conf files into the conf.d directory of telegraf. If we put all our configs (more than 1000 files) into the directory, we see telegraf and influxdb are eating up all resources of the VM.
It seems telegraf is spawning a thread for every config file ( first we hit the number of open files limit) and inserts the data in parallel into influx, which claims too many connections. After adjusting this parameter on influxdb side the system is completely overloaded and unusable.
So my question is: Has anybody experience on how to deal with a per device config for telegraf? We need per device snmp credentials and per device tagging. Are there any parameter which we can tell telegraf to process not all files at once? What are we overlooking here?
I’m having 2 systems that each have around 1800 config files and Telegraf is perfectly fine with that. Each config file has a inputs.ping and a inputs.snmp configuration set. I also had to increase the number of open files to Telegraf and switched over to the native ping method (also required a permission to be set).
So I think you might need to come up with more details to see where your problem actually is.
Also, I cannot see any relation to the InfluxDB problem, since InfluxDB has nothing to do with the (number of) Telegraf config files.
thank you very much for your reply.
We have a [output] section in every config file and it seems telegraph does what we have told him and creates an output for each file…Which is not what we want in the end.
however thank you again, your hint about the inputs sent me to the right direction and I am going to remove the output section from our template and include it only once.