Why are my metrics so late?

Hello there!

I want to figure out why my metrics can arrive 13 minutes late sometimes. So far, I noticed that each time this happens one of the .agents is offline - thus, several retries might be attempted. But I can’t seem to figure out how to properly do any kind of profiling on it - especially since it happens a little bit outside of my own control.

do you have any ideas or suggestions as to how to track down this exact problem? Currently, the global interval is 60s and timeout at 5s. But this isn’t even anywhere near the whopping 13 minutes that we’re seeing here…

Thank you and kind regards!

Hi, how many agents do you have in your inputs.snmp config? It is advised to split it into multiple configs, read more: Telegraf Best Practices: SNMP Plugin | InfluxData

It’s just ~3 agents. For instance, if we are monitoring a set of switches, they are configured in one input as a list of agents.

Will check the link and see if it helps!