[[inputs.mqtt_consumer]] stops when using [[processors.lookup]] after some time

I’ve been toying around with mqtt input and lookup processor for some time. The below shown config works fine, but only for some time (around 7 minutes, equal to around 1.000 ingested and processed mqtt topics). Then the mqtt input stops - or more precisely, no error message and no output from mqtt anymore (other inputs seems to continue, however, so telegraf itself seems not to be affected).

When running without the lookup processor, all is fine (tested over more than 2 hrs).

Restarting the telegraf process (v 1.26.2 on docker) … all fine.

telegraf.conf

[global_tags]

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  hostname = ""
  omit_hostname = true


[[outputs.file]]
  files = ["stdout"]


#####################
# SIMPLE mqtt message
# i.e.
# mtr/hzg/temp_VL 25.8
#####################

[[inputs.mqtt_consumer]]
  alias = "mqtt_consumer_value_1"
  client_id = "telegraf_value_1"
  servers = ["tcp://127.0.0.1:1883"]
  username = "uuuuuuuu"
  password = "pppppppp"
  name_override = "logdata_test"        # measurement
  data_format = "value"
  data_type = "string"
  topic_tag = "addr"            # tag, instead of "topic"
  topics = [
    "knx/2/2/2",
    "knx/2/0/3",
    "knx/4/4/10",
    "knx/4/4/11",
    "knx/4/4/12",
    "knx/4/4/13",
    "knx/4/4/30",
    "knx/4/0/30",
    "knx/4/0/31",
    "knx/4/0/33",
    "knx/4/0/34",
    "knx/4/0/35",
    "knx/4/0/36",
    "knx/4/5/1",
    "knx/4/5/2",
    "knx/4/5/3",
    "knx/4/5/4",
    "knx/4/5/5",
    "knx/4/5/6",
    "mtr/wasser",
    "mtr/hzg/temp_VL",
    "mtr/hzg/temp_RL",
    "mtr/hzg/temp_RLWW",
    "mtr/hzg/temp_VLWW",
    "mtr/hzg/temp_WW",
    "mtr/hzg/temp_abgas",
    "shellies/hzg/relay/0/power",
    "shellies/hzg/relay/0/energy",
  ]


[[processors.lookup]]
  files = [ "/etc/telegraf/ufn.csv" ]           # Path as mapped into docker
  format = "csv_key_values"
  key = '{{.Tag "addr"}}'                       # Find "key=addr" in ufn.csv and                                                                                                 add "ufn"

ufn.csv (processors.lookup file)

ignored,                     ufn
knx/2/2/2,                   BadOG HzStrahler (min)
knx/4/4/30,                  BadOG Helligkeit (lm)
knx/4/4/10,                  WetterStat Helligkeit (lm)
knx/4/4/11,                  Wetterstat Temp (°C)
knx/4/4/12,                  Wetterstat Wind (m/s)
knx/4/4/13,                  Wetterstat Regen (ja nein)
knx/4/0/30,                  Gas (l)
knx/2/0/3,                   Wasser warm Zirk (min)
knx/4/0/33,                  Wasser warm (l)
knx/4/0/34,                  Wasser Haus (l)
knx/4/0/35,                  Wasser Garten (l)
knx/4/0/36,                  Wasser Toiletten (l)
mtr/wasser,                  Wasser Hauptzaehler (l)
mtr/hzg/temp_VL,             Hzg Temp VorLauf (°C)
mtr/hzg/temp_RL,             Hzg Temp RueckLauf (°C)
mtr/hzg/temp_VLWW,           Hzg Temp VorLauf WarmWasser (°C)
mtr/hzg/temp_RLWW,           Hzg Temp RueckLauf WarmWasser (°C)
mtr/hzg/temp_WW,             Hzg Temp Auslauf WarmWasser (°C)
mtr/hzg/temp_abgas,          Hzg Temp Rauchgas (°C)
shellies/hzg/relay/0/power,  Hzg Strom Power (W)
shellies/hzg/relay/0/energy, Hzg Strom Energy (Wmin)
knx/4/5/1,                   PV Autarkie (%)
knx/4/5/2,                   PV Eigenverbrauch (%)
knx/4/5/3,                   PV Batterie SOC (%)
knx/4/5/4,                   PV Leistung Solar (Wh)
knx/4/5/5,                   PV Leistung Batterie (Wh)
knx/4/5/6,                   PV Verbrauch Haus (Wh)
knx/4/5/7,                   PV Bezug Netz (Wh)
mtr/pwr/evu/SENSOR,          Strom (Wh)

@jpowers Do you have any thoughts here?
Thank you!

@universal-dilettant any error messages etc. in the logs? Did you try running Telegraf in debug mode?

1 Like

@srebhan
Yes, I did try to debug, in the limits of this document.

You mentioned logs … are there any specific ones that I’m not aware of currently?
Nothing worth to be mentioned, however, in the system logs of the linux that is hosting the container.

I did count the nb of items processed before it stopped: 1.000 exactly. Let me reiterate that all works fine without the lookup processor. Buffer issue?

Following is head and tail (30 lines each) of the captured output.

2023-05-17T15:11:07Z I! Loading config: /etc/telegraf/telegraf.conf
2023-05-17T15:11:07Z I! Starting Telegraf 1.26.2
2023-05-17T15:11:07Z I! Available plugins: 235 inputs, 9 aggregators, 27 processors, 22 parsers, 57 outputs, 2 secret-stores
2023-05-17T15:11:07Z I! Loaded inputs: mqtt_consumer
2023-05-17T15:11:07Z I! Loaded aggregators:
2023-05-17T15:11:07Z I! Loaded processors: lookup
2023-05-17T15:11:07Z I! Loaded secretstores:
2023-05-17T15:11:07Z I! Loaded outputs: file
2023-05-17T15:11:07Z I! Tags enabled:
2023-05-17T15:11:07Z I! [agent] Config: Interval:10s, Quiet:false, Hostname:"", Flush Interval:10s
2023-05-17T15:11:07Z D! [agent] Initializing plugins
2023-05-17T15:11:07Z D! [agent] Connecting outputs
2023-05-17T15:11:07Z D! [agent] Attempting connection to [outputs.file]
2023-05-17T15:11:07Z D! [agent] Successfully connected to outputs.file
2023-05-17T15:11:07Z D! [agent] Starting service inputs
2023-05-17T15:11:07Z I! [inputs.mqtt_consumer::mqtt_consumer_value_1] Connected [tcp://127.0.0.1:1883]
2023-05-17T15:11:17Z D! [outputs.file] Wrote batch of 14 metrics in 160.258µs
2023-05-17T15:11:17Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="6623" 1684336268931756123
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="524" 1684336268992726946
logdata_test,addr=knx/4/5/2,ufn=PV\ Eigenverbrauch\ (%) value="4" 1684336270932841441
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="6622" 1684336271928578146
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="514" 1684336271990281683
logdata_test,addr=mtr/hzg/temp_VL,ufn=Hzg\ Temp\ VorLauf\ (°C) value="20.8" 1684336272371387689
logdata_test,addr=knx/4/4/12,ufn=Wetterstat\ Wind\ (m/s) value="1.52" 1684336272875071374
logdata_test,addr=shellies/hzg/relay/0/energy,ufn=Hzg\ Strom\ Energy\ (Wmin) value="1758289" 1684336273299261854
logdata_test,addr=mtr/hzg/temp_RL,ufn=Hzg\ Temp\ RueckLauf\ (°C) value="21.0" 1684336273373418517
logdata_test,addr=mtr/hzg/temp_RLWW,ufn=Hzg\ Temp\ RueckLauf\ WarmWasser\ (°C) value="34.5" 1684336274375396891
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="6666" 1684336274967214881
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="546" 1684336275001911721


[...]

2023-05-17T15:22:57Z D! [outputs.file] Wrote batch of 16 metrics in 106.478µs
2023-05-17T15:22:57Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="7354" 1684336978073566307
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="646" 1684336978102989316
logdata_test,addr=knx/4/4/12,ufn=Wetterstat\ Wind\ (m/s) value="1.18" 1684336978921796020
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="7340" 1684336981083900787
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="627" 1684336981119266366
logdata_test,addr=knx/4/4/12,ufn=Wetterstat\ Wind\ (m/s) value="1.53" 1684336982754077539
logdata_test,addr=mtr/hzg/temp_VL,ufn=Hzg\ Temp\ VorLauf\ (°C) value="20.8" 1684336982819899348
logdata_test,addr=mtr/hzg/temp_RL,ufn=Hzg\ Temp\ RueckLauf\ (°C) value="21.0" 1684336983821399129
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="7356" 1684336984073795417
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="668" 1684336984107274224
logdata_test,addr=shellies/hzg/relay/0/energy,ufn=Hzg\ Strom\ Energy\ (Wmin) value="1758412" 1684336984365569938
logdata_test,addr=mtr/hzg/temp_RLWW,ufn=Hzg\ Temp\ RueckLauf\ WarmWasser\ (°C) value="32.3" 1684336984824234304
logdata_test,addr=mtr/hzg/temp_WW,ufn=Hzg\ Temp\ Auslauf\ WarmWasser\ (°C) value="35.5" 1684336985824848695
logdata_test,addr=knx/4/4/12,ufn=Wetterstat\ Wind\ (m/s) value="1.86" 1684336986058482772
logdata_test,addr=mtr/hzg/temp_abgas,ufn=Hzg\ Temp\ Rauchgas\ (°C) value="29.8" 1684336986825337509
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="7317" 1684336987087664595
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="621" 1684336987119087451
2023-05-17T15:23:07Z D! [outputs.file] Wrote batch of 17 metrics in 163.486µs
2023-05-17T15:23:07Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
logdata_test,addr=knx/4/5/4,ufn=PV\ Leistung\ Solar\ (Wh) value="7321" 1684336990088451945
logdata_test,addr=knx/4/5/6,ufn=PV\ Verbrauch\ Haus\ (Wh) value="601" 1684336990123322237
2023-05-17T15:23:17Z D! [outputs.file] Wrote batch of 2 metrics in 66.632µs
2023-05-17T15:23:17Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2023-05-17T15:23:27Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2023-05-17T15:23:37Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2023-05-17T15:23:47Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2023-05-17T15:23:57Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2023-05-17T15:24:07Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics


@universal-dilettant can you please open an issue! I think it has something to do with the tracking metrics…

@universal-dilettant can you please test fix(processors.lookup): Do not strip tracking info by srebhan · Pull Request #13301 · influxdata/telegraf · GitHub? Please open an issue anyway.

@srebhan: Issue

@srebhan: sorry for asking silly or dumb questions … I’m neither fluent in GOlang nor in building execs or docker containers from raw code. What’s the easiest way to test the patch you proposed? Ideally it would be a downloadable container.

@universal-dilettant there is no such thing as a “silly or dump question”!

To test you can either go to the comment of the tigerbot in the PR and download a static binary for your arch (e.g. linux_amd64.tar.gz). If you extract it you get a single Telegraf binary without any dependencies and run that with your config outside of a docker container.

If that is not feasible, you can now (that the PR is merged) test using telegraf’s nightly builds. At the end there is a comment on available docker containers.

The nightly build did it.
telegraf with above config file (in an extended version) is running for 3 days now and collecting / transforming / storing in influxDB without flaw.

@srebhan: thank you very much for your help and support.

1 Like