We are trying to monitor a bunch of network devices (> 1000) using telegraf and snmp input plugin. Because we are facing the requirement to have a unique snmp community per device, we are forced to use a per device config for telegraf.
Our Tests run fine if we are putting aroung 20 / 30 .conf files into the conf.d directory of telegraf. If we put all our configs (more than 1000 files) into the directory, we see telegraf and influxdb are eating up all resources of the VM.
It seems telegraf is spawning a thread for every config file ( first we hit the number of open files limit) and inserts the data in parallel into influx, which claims too many connections. After adjusting this parameter on influxdb side the system is completely overloaded and unusable.
So my question is: Has anybody experience on how to deal with a per device config for telegraf? We need per device snmp credentials and per device tagging. Are there any parameter which we can tell telegraf to process not all files at once? What are we overlooking here?
Any suggestions are highly welcome.