Telegraf SNMP data collection for fleet of devices

revanth · October 7, 2020, 2:41pm

Need advice on how to manage snmp way of extracting metrics from the fleet (~2K devices).

Do I need to split these devices into small chunks and run multiple telegraf instances probably in docker containers? what are the advantages of this approach instead of running all devices using one telegraf instance using one configuration file with IPs of remote devices in agents: [""] under input.snmp plugin.

Please help in clarifying

willcooke · October 9, 2020, 2:53pm

Hi @revanth

I don’t think there is any need to split the devices because of limitations in Telegraf specifically, but you might like to split your config up into separate files to make it more manageable.

If you wanted to split the config files up, this issue has a pretty good example and explanation of what you might like to do: Telegraf Configuration - Recommended approach for multiple .conf files? · Issue #6334 · influxdata/telegraf · GitHub

However - if you are finding that it takes too long to walk the MIBs for thousands of devices then I can see that running multiple Telegraf instances in parallel would speed things up for you.

In my experience, SNMP management interfaces on devices are fickle and you might find that some devices respond more reliably and more quickly than others, so perhaps separating your slower responding devices out in to their own config could allow you to poll the more responsive devices more often.

In summary, the trade off here is between ease of maintaining config file(s) vs unreliable devices holding up the gathering of metrics for your faster responding devices.
There is nothing in Telegraf which would prevent you from setting things up in either of these ways.

I would start out with a single Telegraf instance and multiple config files and if that doesn’t perform reliably for you because of unstable devices, split those off in to a different Telegraf set up.

Cheers, Will

revanth · October 9, 2020, 3:47pm

Thank you @willcooke for the detailed explanation and reference, I am able to absorb from the git reference but want to make sure my understandings are correct. If you do not mind, can you please clarify further:

One telegraf config file with multiple devices/IPs to monitor, for example agents = [ "IP1", "IP2", "IP3" ], does telegraf fetch metrics parallelly from each IP or sequential?
Creating multiple telegraf configuration files in telegraf.d directory, will telegraf create a separate thread for each file to run in parallel or process each file one after the other.

Thanks again,
Revanth

willcooke · October 9, 2020, 3:59pm

As I understand it they are called sequentially. Under the covers it uses net-snmp tools. http://www.net-snmp.org/

Telegraf will take all of those separate files and combine them into a single config file internally. They are only separated logically to make it easy to manage.

zaki · December 15, 2020, 1:10pm

@willcooke @revanth I’m sorry to interfere but I have a small question regarding this matter, I’m monitoring my network devices using telegraf/snmp and as we have a lot of IPs is there a way where i can import the agents’ IP from a text file instead of adding each IP at a time. To be more clear what i want is instead of having :
agents = [ “IP1”, “IP2”, “IP3” ]`
is there a way to have it like this:
agents = [ include /etc/telegraf/lisIP.text ] ?

revanth · January 4, 2021, 9:18pm

@zaki I am not aware of such flexibility, may be you can open a separate topic with this question. I will be interested to know as well.

yurividal · August 10, 2021, 4:34am

Did you ever figure this out? I have the same question…

influxian · August 10, 2021, 1:59pm

even though telegraf can pull the data, for nearly 2k devices, I’d probably buy or develop a process to manage adding/removing devices from the pull process or else your editing telegraf configs.

if you could pull the data via an app or script and dump it into a database or separate json files and then just ingest it via telegraf. You’d offload all that snmpget traffic to another system and then use a single telegraf config/process to just ingest the data.

python supports multithreading, and with an snmp module, you could do asynch pulls from the devices, which would speed up your process.

Just an idea that is a bit more scalable and has more visibility. you can write in many safeguards and checks with a py script.

Topic		Replies	Views
Best way to scale Telegraf Telegraf telegraf	3	3511	October 4, 2018
Monitoring an application running on several servers Telegraf telegraf , smnp , grafana	2	921	September 7, 2017
Best practices for automating Telegraf config generation Telegraf telegraf	4	4764	April 9, 2018
How can I add thousands of SNMP agent to telegraf Telegraf snmp	6	2624	August 1, 2022
Telegraf SNMP inputs agents on different file Telegraf telegraf	3	3328	March 6, 2018

Telegraf SNMP data collection for fleet of devices

Related topics