Telegraf input plugin "mqtt_consumer" stops working

I’m using the inputs.mqtt_consumer plugin to retrieve some data from a LoRaWAN networkserver (TTN). I noticed recently that no more data where written into my InfluxDB.
I restarted telegraf and data were written again.

I looked into telegraf’s log-files and found error entries for all concerned mqtt_consumer instances, e.g.

  • E! [inputs.mqtt_consumer::ttn_consumer_ow] Error in plugin: connection lost: read tcp 172.19.0.3:42600->52.212.223.226:1883: read: connection reset by peer
  • E! [inputs.mqtt_consumer::ttn_consumer_ow] Error in plugin: network Error : read tcp 172.19.0.3:33402->63.34.215.128:1883: i/o timeout

I guess it can happen that the connection between the server running telegraf and the mqtt-server is temporarily interrupted. But why does that cause the telgraf input plugins to quit working?

Here are some more observations:

  • telegraf is running in a docker container
  • other input- and output plugins of the same telegraf instance continued working without issue
  • As you can see in the example above it seemed that the IP address for the configured mqtt-server changed.
    Maybe that’s just a normal process (e.g. for something like load-balancing?).
    Maybe that causes the telegraf issue?
    On the other hand there are also error entries in the log that have the same IP address for both error messages (read: connection reset by peer / i/o timeout).
  • when I restarted the telegraf container data started immediately to flow again from the mqtt-server to the InfluxDB

Does someone has an advice how to prevent telegraf from stopping plugins when a connection interruption occurs, respectively enable it to start again when the connection is reestablished?

Hi there, I am experiencing this as well.

This is not running in docker, but a standard install on a raspberry pi.
Distributor ID: Debian
Description: Debian GNU/Linux 12 (bookworm)
Release: 12
Codename: bookworm
6.6.31+rpt-rpi-v8
Telegraf 1.32.1 (git: HEAD@946e4d7d)

Ok, then it is already two of us! :blush:
Maybe someone from the staff (@Anaisdg ?) can provide an answer?

Could you please check of there already is an issue in the project and if not create a new one!?

Hi @srebhan ,

I’m not sure if I understand your post. :thinking:
You mean if there is already a bug-report somewhere (not sure where InfluxDB issue reports are officially filed) regarding the described telegraf plugin crash?

It happened 2 days ago again. Unfortunately I noticed it only yesterday, so I lost a day of data. :weary:
Since there is no answer on how to avoid this from happening: is there at least a way to monitor the operation of the plugins? As I said above, telegraf as such continues to operate, it’s just the mqtt-plugin that stops working.

I looked at telegraf’s GitHub page and indeed I found an issue entry that is related to this topic (MQTT consumer plugin disconnects frequently and can not reconnect successfully · Issue #16293 · influxdata/telegraf · GitHub).
It was filed just 2 days ago.