Hi everyone!
Telegraf periodically fails to send data that goes through the aggregator (where the average value is calculated). This is clearly visible on the Grafana graph. I added 3 metrics processed by the aggregator and 1 that is not. After some time, it tries to resend the missing data — I can see this in the batch logs.
I think the issue is in the global Telegraf settings.
I need help 
[global_tags]
[agent]
interval = "60s"
round_interval = false
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "65s"
flush_jitter = "0s"
precision = "0s"
logfile = "C:\\Program Files\\InfluxData\\telegraf\\log.txt"
logfile_rotation_interval = "24h"
logfile_rotation_max_size = "50MB"
logfile_rotation_max_archives = 3
log_with_timezone = "local"
debug = true
skip_processors_after_aggregators = false
[[outputs.http]]
url = "http://10.28.130.32:8080"
method = "POST"
data_format = "json"
[[inputs.disk]]
fieldpass = ["used_percent"]
taginclude = ["key", "host"]
[[inputs.mem]]
fieldpass = ["used_percent"]
taginclude = ["host"]
[[inputs.mem]]
name_override = "mem.t"
fieldpass = ["total"]
taginclude = ["host"]
[[inputs.cpu]]
percpu = false
totalcpu = true
fieldpass = ["usage_idle"]
tagexclude = ["cpu"]
[[aggregators.basicstats]]
period = "55s"
drop_original = true
stats = ["mean"]
namepass = ["cpu", "mem", "disk"]

@Alik_Phatkov Welcome to the Influxdata community!
Looking at your Telegraf configuration, I can see a potential timing issue that’s likely causing the periodic data loss for aggregated metrics. The problem appears to be in the relationship between your aggregator period and flush interval settings.
The Issue:
- Your aggregator period is set to
55s
- Your flush interval is set to
65s
- Your collection interval is
60s
This creates a timing mismatch where the aggregator might not have enough data points to calculate meaningful averages before the flush occurs, or the flush might happen at inconsistent times relative to the aggregation window.
Recommended Solutions:
- Adjust the aggregator period to be slightly less than your flush interval:
[[aggregators.basicstats]]
period = "60s" # Changed from 55s
drop_original = true
stats = ["mean"]
namepass = ["cpu", "mem", "disk"]
- Or adjust your flush interval to be more frequent:
[agent]
flush_interval = "60s" # Changed from 65s
- Consider adding some buffer time by setting:
[[aggregators.basicstats]]
period = "50s" # Gives 10s buffer before flush
drop_original = true
stats = ["mean"]
namepass = ["cpu", "mem", "disk"]
Additional Recommendations:
- Add flush_jitter to prevent all metrics from flushing at exactly the same time:
[agent]
flush_jitter = "5s"
- Consider increasing metric_buffer_limit if you’re seeing buffer overflows:
[agent]
metric_buffer_limit = 20000 # Increased from 10000
- Monitor your logs for any aggregator-related warnings or errors, especially around the timing when data goes missing.
The fact that you see retry attempts in the batch logs suggests Telegraf is detecting the missing data and trying to resend it, which supports the theory that it’s a timing/synchronization issue rather than a network or output problem.
Try implementing the first solution (adjusting aggregator period to 60s) and monitor your Grafana dashboard to see if the gaps disappear. If the issue persists, please share any relevant log entries from around the time when data goes missing.
1 Like
Telegraf may skip sending aggregated data due to plugin misconfig, buffer overflows, or timing issues—check logs, flush intervals, and output plugin settings.