Hi,
i am facing right now a very strange behaviour in Telegraf version 1.33.0 and newer in combination with the prometheus input plugin and the influxdb v2 output plugin.
Assume follwing telegraf.conf
[agent]
interval = '60s'
collection_jitter = '5s'
flush_interval = '60s'
flush_jitter = '10s'
precision = '1s'
logfile = '/var/log/telegraf/error.log'
metric_batch_size = 5000
metric_buffer_limit = 1000000
round_interval = true
debug = true
quiet = false
omit_hostname = false
[[outputs.influxdb_v2]]
urls = ['https://influx.my.com']
token = '---------------'
organization = 'my_org'
bucket = 'my_bucket'
timeout = '15s'
insecure_skip_verify = true
[[inputs.internal]]
interval = '1h'
tags = {"telegraf" = "1"}
collect_memstats = true
[[inputs.exec]]
commands = ['/usr/local/nagios/libexec/check_load -w 10,9,8 -c 11,10,9']
interval = '30s'
timeout = '15s'
name_suffix = '__9'
data_format = 'nagios'
Everything works fine, data is written to the influxdb
Now i am adding a configuration for the prometheus input plugin like this:
[[inputs.prometheus]]
urls = ["https://influx.my.com/metrics"]
metric_version = 2
insecure_skip_verify = true
to fetch the metrics from my influxdb.
As soon as the plugin is activated no data reaches my output.influxdb_v2
any more. The telegraf asserts it is sending all data regularly and no problem occurs, but nothing reaches my influxdb. Even the nginx serving the influxdb does not recognize any incoming requests any more.
This happens only with the prometheus plugin with metric_version=2
defined. If i switch to metric_version=1
data is send and received.
Also writing data to [outputs.file] works and running the telegraf with --test
outputs the prometheus metrics without problems.
I have downgraded my telegraf to check since when this occurs and it seems that all version from 1.33.0 on have this problem. Telegraf 1.32.3 is the last one working for me.
Just for completeness here is the telegraf error.log:
2025-06-06T12:30:25Z I! Loading config: /etc/telegraf/telegraf.conf
2025-06-06T12:30:25Z I! Starting Telegraf 1.33.0 brought to you by InfluxData the makers of InfluxDB
2025-06-06T12:30:25Z I! Available plugins: 236 inputs, 9 aggregators, 33 processors, 26 parsers, 63 outputs, 6 secret-stores
2025-06-06T12:30:25Z I! Loaded inputs: exec (1x) internal prometheus
2025-06-06T12:30:25Z I! Loaded aggregators:
2025-06-06T12:30:25Z I! Loaded processors:
2025-06-06T12:30:25Z I! Loaded secretstores:
2025-06-06T12:30:25Z I! Loaded outputs: influxdb_v2
2025-06-06T12:30:25Z D! [agent] Initializing plugins
2025-06-06T12:30:25Z D! [agent] Connecting outputs
2025-06-06T12:30:25Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2025-06-06T12:30:25Z D! [agent] Successfully connected to outputs.influxdb_v2
2025-06-06T12:30:25Z D! [agent] Starting service inputs
2025-06-06T12:30:25Z D! [agent] Stopping service inputs
2025-06-06T12:30:25Z D! [agent] Input channel closed
2025-06-06T12:30:25Z I! [agent] Hang on, flushing any cached metrics before shutdown
2025-06-06T12:30:25Z D! [serializers.influx] could not serialize field "task_executor_run_duration": is NaN; discarding field
2025-06-06T12:30:25Z D! [serializers.influx] could not serialize field "task_executor_run_duration": is NaN; discarding field
2025-06-06T12:30:25Z D! [serializers.influx] could not serialize field "task_executor_run_duration": is NaN; discarding field
2025-06-06T12:30:25Z D! [serializers.influx] could not serialize field "task_executor_run_queue_delta": is NaN; discarding field
2025-06-06T12:30:25Z D! [serializers.influx] could not serialize field "task_executor_run_queue_delta": is NaN; discarding field
2025-06-06T12:30:25Z D! [serializers.influx] could not serialize field "task_executor_run_queue_delta": is NaN; discarding field
2025-06-06T12:30:25Z D! [outputs.influxdb_v2] Wrote batch of 725 metrics in 2.221348ms
2025-06-06T12:30:25Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 1000000 metrics
2025-06-06T12:30:25Z I! [agent] Stopping running outputs
2025-06-06T12:30:25Z D! [agent] Stopped Successfully
I ran it with --once
After looking at the changelogs my guess is that maybe the new rate-limiter introduced in 1.33.0 is causing this?
Does anyone has an idea whats going on here?