MQTT consumer input: network error crashes Telegraf

I’ve deployed Telegraf to a Kubernetes cluster with a config which subscribes to MQTT topic (inputs) and outputs to InfluxDB2 (also on the cluster). The pod (service) starts up ok but after 4 minutes it has an error with the MQTT connection and restarts the whole Telegraf service. I have a couple of questions:

  1. Why does it crash and restart and not just retry the connection?
  2. How can I troubleshoot the error further than the debug logs (see below)?
2022-02-20T23:53:35Z I! Starting Telegraf 1.21.4
2022-02-20T23:53:35Z I! Using config file: /etc/telegraf/telegraf.conf
2022-02-21T12:53:35+13:00 I! Loaded inputs: mqtt_consumer (2x)
2022-02-21T12:53:35+13:00 I! Loaded aggregators:
2022-02-21T12:53:35+13:00 I! Loaded processors:
2022-02-21T12:53:35+13:00 I! Loaded outputs: influxdb_v2 (2x)
2022-02-21T12:53:35+13:00 I! Tags enabled: host=telegraf-c9fc696bc-xb4r8
2022-02-21T12:53:35+13:00 I! [agent] Config: Interval:10s, Quiet:false, Hostname:"telegraf-c9fc696bc-xb4r8", Flush Interval:10s
2022-02-21T12:53:35+13:00 D! [agent] Initializing plugins
2022-02-21T12:53:35+13:00 D! [agent] Connecting outputs
2022-02-21T12:53:35+13:00 D! [agent] Attempting connection to [outputs.influxdb_v2]
2022-02-21T12:53:35+13:00 D! [agent] Successfully connected to outputs.influxdb_v2
2022-02-21T12:53:35+13:00 D! [agent] Attempting connection to [outputs.influxdb_v2]
2022-02-21T12:53:35+13:00 D! [agent] Successfully connected to outputs.influxdb_v2
2022-02-21T12:53:35+13:00 D! [agent] Starting service inputs
2022-02-21T12:57:45+13:00 E! [telegraf] Error running agent: starting input inputs.mqtt_consumer: network Error : EOF

I’m still not sure why it was crashing or how to troubleshoot (which is concerning), but I got it working with help from this post: MQTT from AWS IoT - Telegraf - InfluxData Community Forums. I set the server url to ssl://… and it connected.

Does anyone know how to stop the service from crashing when it can’t connect?

I believe I responded to your bug report. Give that a read and feel free to respond over there. It does sound like we should/could update the docs around the URL connection string based on how you resolved this.