Telegraf periodically doesn't send aggregated data

Alik_Phatkov · July 9, 2025, 1:34pm

Hi everyone!
Telegraf periodically fails to send data that goes through the aggregator (where the average value is calculated). This is clearly visible on the Grafana graph. I added 3 metrics processed by the aggregator and 1 that is not. After some time, it tries to resend the missing data — I can see this in the batch logs.
I think the issue is in the global Telegraf settings.
I need help

[global_tags]

[agent]
 
  interval = "60s"
  round_interval = false
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "65s"
  flush_jitter = "0s"
  precision = "0s"
  logfile = "C:\\Program Files\\InfluxData\\telegraf\\log.txt"
  logfile_rotation_interval = "24h"
  logfile_rotation_max_size = "50MB"
  logfile_rotation_max_archives = 3
  log_with_timezone = "local"
  debug = true
  skip_processors_after_aggregators = false

[[outputs.http]]
  url = "http://10.28.130.32:8080"
  method = "POST"
  data_format = "json"



[[inputs.disk]]
  fieldpass = ["used_percent"]
  taginclude = ["key", "host"]


[[inputs.mem]]
  fieldpass = ["used_percent"]
  taginclude = ["host"]


[[inputs.mem]]
  name_override = "mem.t"
  fieldpass = ["total"]
  taginclude = ["host"]


[[inputs.cpu]]
  percpu = false
  totalcpu = true
  fieldpass = ["usage_idle"]
  tagexclude = ["cpu"]


[[aggregators.basicstats]]
  period = "55s"
  drop_original = true
  stats = ["mean"]
  namepass = ["cpu", "mem", "disk"]
![графана|656x329](upload://5ztTfBcYuIPwLqI2cmmz8FEqHnD.png)

skartikey · July 15, 2025, 6:41pm

@Alik_Phatkov Welcome to the Influxdata community!

Looking at your Telegraf configuration, I can see a potential timing issue that’s likely causing the periodic data loss for aggregated metrics. The problem appears to be in the relationship between your aggregator period and flush interval settings.

The Issue:

Your aggregator period is set to 55s
Your flush interval is set to 65s
Your collection interval is 60s

This creates a timing mismatch where the aggregator might not have enough data points to calculate meaningful averages before the flush occurs, or the flush might happen at inconsistent times relative to the aggregation window.

Recommended Solutions:

Adjust the aggregator period to be slightly less than your flush interval:

[[aggregators.basicstats]]
  period = "60s"  # Changed from 55s
  drop_original = true
  stats = ["mean"]
  namepass = ["cpu", "mem", "disk"]

Or adjust your flush interval to be more frequent:

[agent]
  flush_interval = "60s"  # Changed from 65s

Consider adding some buffer time by setting:

[[aggregators.basicstats]]
  period = "50s"  # Gives 10s buffer before flush
  drop_original = true
  stats = ["mean"]
  namepass = ["cpu", "mem", "disk"]

Additional Recommendations:

Add flush_jitter to prevent all metrics from flushing at exactly the same time:

[agent]
  flush_jitter = "5s"

Consider increasing metric_buffer_limit if you’re seeing buffer overflows:

[agent]
  metric_buffer_limit = 20000  # Increased from 10000

Monitor your logs for any aggregator-related warnings or errors, especially around the timing when data goes missing.

The fact that you see retry attempts in the batch logs suggests Telegraf is detecting the missing data and trying to resend it, which supports the theory that it’s a timing/synchronization issue rather than a network or output problem.

Try implementing the first solution (adjusting aggregator period to 60s) and monitor your Grafana dashboard to see if the gaps disappear. If the issue persists, please share any relevant log entries from around the time when data goes missing.

jacjhsh773 · July 24, 2025, 11:05am

Telegraf may skip sending aggregated data due to plugin misconfig, buffer overflows, or timing issues—check logs, flush intervals, and output plugin settings.

Topic		Replies	Views
Telegraf Periodically Stops Sending Metrics Telegraf influxdb , docker , mqtt	4	2014	October 14, 2022
["outputs.influxdb"] did not complete within its flush interval Telegraf telegraf	3	10011	January 27, 2021
Telegraf unable to send metrics due to high load average Telegraf	1	334	January 3, 2023
Telegraf Timing (intervals, round_interval, collection_jitter, collection_offset and flush_interval) telegraf	2	2568	March 20, 2023
Issue with telegraf Telegraf telegraf	1	158	March 14, 2024

Telegraf periodically doesn't send aggregated data

Related topics