Telegraf / Influxdb load explodes when using configuration files per device (> 1000 files)

ToWo · August 26, 2021, 12:06pm

We are trying to monitor a bunch of network devices (> 1000) using telegraf and snmp input plugin. Because we are facing the requirement to have a unique snmp community per device, we are forced to use a per device config for telegraf.
Our Tests run fine if we are putting aroung 20 / 30 .conf files into the conf.d directory of telegraf. If we put all our configs (more than 1000 files) into the directory, we see telegraf and influxdb are eating up all resources of the VM.
It seems telegraf is spawning a thread for every config file ( first we hit the number of open files limit) and inserts the data in parallel into influx, which claims too many connections. After adjusting this parameter on influxdb side the system is completely overloaded and unusable.

So my question is: Has anybody experience on how to deal with a per device config for telegraf? We need per device snmp credentials and per device tagging. Are there any parameter which we can tell telegraf to process not all files at once? What are we overlooking here?

Any suggestions are highly welcome.

Best regards,
Cheers,
Tom

Hipska · August 26, 2021, 2:17pm

I’m having 2 systems that each have around 1800 config files and Telegraf is perfectly fine with that. Each config file has a inputs.ping and a inputs.snmp configuration set. I also had to increase the number of open files to Telegraf and switched over to the native ping method (also required a permission to be set).

So I think you might need to come up with more details to see where your problem actually is.

Also, I cannot see any relation to the InfluxDB problem, since InfluxDB has nothing to do with the (number of) Telegraf config files.

ToWo · August 26, 2021, 7:09pm

Hi,
thank you very much for your reply.
We have a [output] section in every config file and it seems telegraph does what we have told him and creates an output for each file…Which is not what we want in the end.
however thank you again, your hint about the inputs sent me to the right direction and I am going to remove the output section from our template and include it only once.

best regards,
cheers,
Tom

Hipska · September 2, 2021, 8:20am

Indeed, all configs (and resulting metrics) are global, so only one output config is needed.

Topic		Replies	Views
Mutliple telegraf.conf files Telegraf	7	847	July 12, 2019
Telegraf data is being sent to multiple bucket Telegraf telegraf	4	1810	January 11, 2022
Best practices for automating Telegraf config generation Telegraf telegraf	4	4742	April 9, 2018
Question about Telegraf input plugins Telegraf	2	474	July 3, 2020
Only first SNMP instance in telegraf.conf outputs to InfluxDB Telegraf	4	9220	March 31, 2017

Telegraf / Influxdb load explodes when using configuration files per device (> 1000 files)

Related topics