Telegarf agent is getting into crashLoopBackOff because of unavailability of the MQTT consumer endpoint

I was trying to deploy telegraf-agent pod with the below configuration file:

[[inputs.mqtt_consumer]]
      alias = "input_mqtt_consumer_host"
      servers = [
        "mqtt://mqtt:1883"
      ]
      topics = [
        "sig/edge/gms/+/postmetrics", 
        "sig/edge/gms/+/postlocalmetrics"
      ]
      startup_error_behavior = "retry"
      json_strict = true
      tag_keys = [
        "tags_*"
      ]
      data_format = "json"
      json_name_key = "measurement"
      json_query = "metrics"
      json_string_fields = [
        "fields_*"
      ]
      json_time_format = "unix"
      json_time_key = "epochtime"
      json_timezone = "UTC"
      [inputs.mqtt_consumer.tags]
        tags_sig_host = "REPLACE_WITH_HOST_NAME"

The telegraf-agent pod is going to crashLoopBackOff state because it is unable to connect to MQTT endpoint.
My telegraf agent is part of the helm chart: chart1 and MQTT is the part of helm chart: chart2.
First chart1 will be installed and then chart2.

Below are the logs of telegraf-agent:

[[rishavkumarj@sigdev00 sig_edge]$ kc logs telegraf-agent-6bbdfd8bd4-tgl9p
2024-09-11T05:04:19Z I! Loading config: /etc/telegraf/telegraf.conf
2024-09-11T05:04:19Z I! Starting Telegraf 1.31.0 brought to you by InfluxData the makers of InfluxDB
2024-09-11T05:04:19Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-09-11T05:04:19Z I! Loaded inputs: http_listener_v2 internal mqtt_consumer
2024-09-11T05:04:19Z I! Loaded aggregators: 
2024-09-11T05:04:19Z I! Loaded processors: enum
2024-09-11T05:04:19Z I! Loaded secretstores: 
2024-09-11T05:04:19Z I! Loaded outputs: health influxdb_v2
2024-09-11T05:04:19Z I! Tags enabled: 
2024-09-11T05:04:19Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s
2024-09-11T05:04:19Z D! [agent] Initializing plugins
2024-09-11T05:04:19Z D! [agent] Connecting outputs
2024-09-11T05:04:19Z D! [agent] Attempting connection to [outputs.influxdb_v2::output_to_global_pipeline]
2024-09-11T05:04:19Z D! [agent] Successfully connected to outputs.influxdb_v2::output_to_global_pipeline
2024-09-11T05:04:19Z D! [agent] Attempting connection to [outputs.health]
2024-09-11T05:04:19Z I! [outputs.health] Listening on http://0.0.0.0:8888
2024-09-11T05:04:19Z D! [agent] Successfully connected to outputs.health
2024-09-11T05:04:19Z D! [agent] Starting service inputs
2024-09-11T05:04:19Z I! [inputs.http_listener_v2::input_http_listener_v2_host] Listening on 0.0.0.0:8087
2024-09-11T05:04:19Z E! [telegraf] Error running agent: starting input inputs.mqtt_consumer::input_mqtt_consumer_host: network Error : dial tcp: lookup mqtt on 10.96.0.10:53: no such host](https://)

Is there a way we can ignore this error for now until MQTT pod is available and make telegraf-agent connect to MQTT once helm chart: chart2 is deployed.

I see we have the startup error behavior option here in the mqtt_consumer plugin

I have tried setting startup_error_behavior to retry to avoid the pod going to crashLoopBackOff state and connect to MQTT once it is available:

[[inputs.mqtt_consumer]]
      startup_error_behavior = "retry"

But the issue still persists.

ok, I have resolved the issue.
Initially, I was using the docker image: telegraf:1.31.0

The configuration startup_error_behavior for mqtt_consumer seems unsupported in the above version.

Once I updated the version to 1.32.0, it started working.

1 Like

@Rishav_Kumar_Jha I just wanted to thank you for sharing your solution with the community! Out of curiosity what are you using InfluxDB for?