Performance of the telegraf snmp collection

chris.churilo · November 2, 2017, 3:49pm

From today’s training session: what is the performance of the telegraf snmp collection bound by? how many snmp objects can it collect and store in influx in given time?

daniel · November 7, 2017, 1:18am

In Telegraf 1.4, collection increases linearly with respect to number of remote agents and the number of fields collected. In the upcoming 1.5 release remote agents will be collected concurrently and should increase based on the number of fields collected only.

The number of objects that can be collected primarily depends on the speed of your snmp devices and network.

izlimx · July 31, 2018, 3:02pm

Hi Daniel, was the concurrency collection of remote agents released for the snmp inputs plugin?

daniel · July 31, 2018, 5:25pm

Yes, this is included in 1.5 and newer.

izlimx · July 31, 2018, 7:19pm

Thank you Daniel for confirming. So just to be clear, if we configure 3000 devices then there will be that many concurrent threads?

I would be interested to know how to size the VM for memory with this in mind. I read an old thread with your note on metric_buffer_limit, however it did not mention how this increases with an increase in remote agents.

Thanks,

daniel · July 31, 2018, 9:02pm

There would be 3000 goroutines mapped onto a thread pool by the Go runtime. Each goroutine would need to allocate temporary space for receiving from it’s SNMP agent. The best way to size is empirically: start small and then double the number of agents and observe the change. You can use the internal input to watch Telegraf’s memory usage.

The metric_buffer_limit sets the upper limit for metric memory storage during failures on a per output plugin basis, so multiply it by the number of outputs if you will use more than one.

izlimx · August 2, 2018, 8:25am

Ok great. Thanks again. I will run my tests.

izlimx · August 2, 2018, 12:56pm

Do we know the size of the thread pool? I am also assuming that the Go runtime is configured to use multiple logical processors?

daniel · August 2, 2018, 6:05pm

I believe it is based on the number of processors on your system, yes Go will use multiple processors. The goroutines are scheduled to run only when they have work available, and don’t consume a thread when they are blocked on network calls.

Touchedegris · October 4, 2018, 12:54am

I could see a huge different on SNMP performance based on the CPU used. For example, a VM with 2 cores i7 could easily complete a full SNMP routine on a 10s polling setup, while using it on a raspberry Pi3 was barely able to complete the routine in a minute polling (both based on centos 7 with docker containers).
Is there a way to use the --test option and get the time it took for the routine to complete? It would help to find the best polling interval to set for an instance.
Thanks!

daniel · October 4, 2018, 1:34am

Closest thing to this would be using the internal input, it will produce metrics with the gather time for each plugin.

BharatSharma · October 8, 2018, 6:14pm

On a somewhat related note, I am also using the snmp input on several thousand devices (I have multiple containers running the collections). I would like to add additional tags to each agent. To achieve this, my understanding is that I would have to have a separate input section for each agent I am collecting from. Will this have significant performance implications compared to passing in a large ‘agent’ list? Or does telegraf optimize that somehow.

daniel · October 8, 2018, 8:32pm

There are some tradeoffs, I expect it will use more memory when split out into separate configs. However, I don’t know the exact details, if you do make this transformation it would be nice if you could enable the internal input and take some measurements before and after the change.

One other thing to consider is that, depending on what sort of tags you are adding, it might be feasible to use a processor for tagging instead.

BharatSharma · October 9, 2018, 2:42pm

Thanks @daniel. Started to look into the out of box processors. Is it possible somehow to do something like below:

    [inputs.snmp.tagpass]
            agent_host = ["agent1"]
               [inputs.snmp.tags]
                    newkey1 = newtag1
                    newkey2 = newtag2

            agent_host = ["agent2"]
               [inputs.snmp.tags]
                    newkey1 = newtag3
                    newkey2 = newtag4

Seems to work if I had just one ‘agent_host’ section. Is there a way to chain these (the documentation mentions Excluded metrics are passed downstream to the next processor.)?

If not, will go back and do some testing with a 1:1 agent/input section (and turn on the internal input )

daniel · October 9, 2018, 7:37pm

You could do this, but if you need to define many of these processors then I
think you will be better off with the 1:1 agent/input for performance reasons.

[[processors.override]]
  namepass = "snmp"
  [processors.override.tagpass]
    agent_host = ["agent1"]
  [processors.override.tags]
    newkey1 = newtag1
    newkey2 = newtag2

[[processors.override]]
  namepass = "snmp"
  [processors.override.tagpass]
    agent_host = ["agent2"]
  [processors.override.tags]
    newkey1 = newtag3
    newkey2 = newtag4

Topic		Replies	Views
Best deployment strategy for production, 1x Telegraf per NE or per NE hardware type? Telegraf performance , kafka , snmp	5	79	December 18, 2024
Need clarity for inputs.snmp agents Telegraf	2	430	February 3, 2022
Multiple Agents SNMP Telegraf	2	50	July 11, 2024
Telegraf SNMP data collection for fleet of devices Telegraf telegraf , smnp , performance	7	2560	August 10, 2021
Best way to scale Telegraf Telegraf telegraf	3	3467	October 4, 2018

Performance of the telegraf snmp collection

Related topics