I have an environment with pool of servers being monitored for various metrics. I have wrote a kapacitor script using deadman to check whether the telegraf is sending metrics to the Influx. I use a custom script in kapacitor deadman to check whether “telegraf is running/not” in the hosts using SNMP.
When I disable the metrics being sent from the servers, Deadman works like a charm. When I stop kapacitor, Deadman sends weird alerts like “0/5m task Metrics_Deadman_Res_metrics is Down”. It would really appreciable to have the hostname added. At times, I receive the same when the services and metrics are up too.
Can someone help on this