Gather Errors with Kafka Input Plugin

I am just doing some load testing on our InfluxDB Telegraf setup. Things seem to be going well, we have a two layer setup. One layer reads from UDP and sends to Kafka, another layer reads from Kafka and writes to Influx.

I noted yesterday afternoon, I started getting some non-zero readings fro “gather_errors” on the layer that reads from Kafka. I’m running debug mode for the logs and don’t see anything, but there is just a background level of 317 values in the internal agent metrics for gather errors.

Any thoughts? Is there a way I can get more insight into what is going on here?

There should be a log message each time this value increments, I don’t know of a way it would increment without writing an error. The value is not reset until Telegraf restarts, maybe the error happened far in the past?

What I’m observing is that I am getting a value of 317 for all hosts for gather_errors. I am not seeing anything in the log file even though I have debug = true.

I’m on telegraf 1.3.4 at the moment.

How many Telegraf instances do you have? They all have the value of 317 or is this an aggregate value?

We have 8 telegraf services in this tier consuming from a single topic all in the same consumer group.

When use the max operator, I seee that the max is always 317.

We have quite a sophisticated setup with good monitoring, and the data seems to be getting through to influx, but just this lack of understanding of what these are bothers me.