Error handling during collection

I am using snmp input plugin and polling 10+ devices/IPs. What will be the behavior when one of the IP in the list not responding:

  1. Does it impacts the metric collection of all other IPs?
  2. Is there a way to capture an instance of the device being down?

When tested I see metrics being collected for all other IPs except faulty one, below is the error on the console:

2020-10-29T14:21:35Z E! [inputs.snmp] Error in plugin: agent xx.xx.xx.xx: performing get on field 
devicename: Request timeout (after 3 retries)
2020-10-29T14:22:15Z E! [inputs.snmp] Error in plugin: agent xx.xx.xx.xx: gathering table snmp_pilot:                 performing bulk walk for field ifName: Request timeout (after 3 retries)
2020-10-29T14:22:15Z D! [agent] Stopping service inputs
2020-10-29T14:22:15Z D! [agent] Input channel closed
2020-10-29T14:22:15Z D! [agent] Stopped Successfully
2020-10-29T14:22:15Z E! [telegraf] Error running agent: input plugins recorded 2 errors

How can I get this error into influx to build a dashboard around unreachable IPs? or any efficient approach to monitor these errors?

I am able to use input.tail plugin for extracting information from telegraf logs and then used processors.regex to extract needful information.

1 Like