Telegraf Timing (intervals, round_interval, collection_jitter, collection_offset and flush_interval)

So im playing around with a Telegraf Agent settings and the Plugins with regards to intervals but more so the flushing > sending the data to the Outputs. Id like it to go “lets say every 2 or 3 mins”

I have a few more plugins which ive left out but generally you get whats going on.
In my logs and alerts received here - it seems its being flushed or run more than a few times in 1min.

The one goal is that the [[outputs.exec]] executes the script every 2 mins, therefore receiving an alert every 2mins…but here you can see at the bottom - the alert came through is strange numbers…

i will supply more logs again - later.

Can someone make sense of the my logs vs my config:

Config:

[agent]
interval = "60s"
round_interval = true
metric_batch_size = 7000
metric_buffer_limit = 700000
collection_jitter = "0s"
collection_offset = "0s"
flush_interval = "180s"
flush_jitter = "0s"
precision = "1ns"

[[inputs.cloudwatch]]
region = “us-west-2”
access_key = “"
secret_key = "

period = “5m”
delay = “5m”
interval = “5m”
namespaces = [“AWS/RDS”]
ratelimit = 25
statistic_include = [ “average”, “sum”, “minimum”, “maximum”, “sample_count” ]
statistic_exclude =

#Execute deadmanswitch script
[[outputs.exec]]
command = [“/etc/telegraf/script/send_teams_deadman_alert.sh”]
data_format = “json”
json_timestamp_units = “1s”
interval = “120s”

[[outputs.prometheus_client]]
expiration_interval = “0s”
listen = “:9273”
path = “/metrics”
string_as_label = false


Logs

2023-03-13T14:29:22Z I! Using config file: /etc/telegraf/telegraf.conf
2023-03-13T14:29:22Z E! Unable to open /etc/telegraf/log (open /etc/telegraf/log: permission denied), using stderr
2023-03-13T14:29:22Z I! Starting Telegraf 1.25.3
2023-03-13T14:29:22Z I! Available plugins: 228 inputs, 9 aggregators, 26 processors, 21 parsers, 57 outputs, 2 secret-stores
2023-03-13T14:29:22Z I! Loaded inputs: cloudwatch (3x) cpu disk diskio kernel mem net netstat processes prometheus (2x) swap system
2023-03-13T14:29:22Z I! Loaded aggregators:
2023-03-13T14:29:22Z I! Loaded processors:
2023-03-13T14:29:22Z I! Loaded secretstores:
2023-03-13T14:29:22Z I! Loaded outputs: exec file prometheus_client
2023-03-13T14:29:22Z I! Tags enabled: host=telegraf-deployment-575ff6f49c-w4wlt
2023-03-13T14:29:22Z I! [agent] Config: Interval:1m0s, Quiet:false, Hostname:“telegraf-deployment-575ff6f49c-w4wlt”, Flush Interval:3m0s
2023-03-13T14:29:22Z D! [agent] Initializing plugins
2023-03-13T14:29:22Z D! [agent] Connecting outputs
2023-03-13T14:29:22Z D! [agent] Attempting connection to [outputs.exec]
2023-03-13T14:29:22Z D! [agent] Successfully connected to outputs.exec
2023-03-13T14:29:22Z D! [agent] Attempting connection to [outputs.file]
2023-03-13T14:29:22Z D! [agent] Successfully connected to outputs.file
2023-03-13T14:29:22Z D! [agent] Attempting connection to [outputs.prometheus_client]
2023-03-13T14:29:22Z I! [outputs.prometheus_client] Listening on http://[::]:9273/metrics
2023-03-13T14:29:22Z D! [agent] Successfully connected to outputs.prometheus_client
2023-03-13T14:29:22Z D! [agent] Starting service inputs
2023-03-13T14:30:05Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 1.10338319s
2023-03-13T14:30:05Z D! [outputs.prometheus_client] Buffer fullness: 8210 / 700000 metrics
2023-03-13T14:30:05Z D! [outputs.file] Wrote batch of 7000 metrics in 1.196714755s
2023-03-13T14:30:05Z D! [outputs.file] Buffer fullness: 8210 / 700000 metrics
2023-03-13T14:30:06Z D! [outputs.file] Wrote batch of 7000 metrics in 301.27396ms
2023-03-13T14:30:06Z D! [outputs.file] Buffer fullness: 3146 / 700000 metrics
2023-03-13T14:30:06Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 409.529645ms
2023-03-13T14:30:06Z D! [outputs.prometheus_client] Buffer fullness: 3146 / 700000 metrics
2023-03-13T14:30:06Z D! [outputs.file] Wrote batch of 7000 metrics in 257.827778ms
2023-03-13T14:30:06Z D! [outputs.file] Buffer fullness: 8034 / 700000 metrics
2023-03-13T14:30:06Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 354.887872ms
2023-03-13T14:30:06Z D! [outputs.prometheus_client] Buffer fullness: 8034 / 700000 metrics
2023-03-13T14:30:07Z D! [outputs.file] Wrote batch of 7000 metrics in 601.597829ms
2023-03-13T14:30:07Z D! [outputs.file] Buffer fullness: 2375 / 700000 metrics
2023-03-13T14:30:07Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 599.908461ms
2023-03-13T14:30:07Z D! [outputs.prometheus_client] Buffer fullness: 2720 / 700000 metrics
2023-03-13T14:30:07Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 305.584833ms
2023-03-13T14:30:07Z D! [outputs.prometheus_client] Buffer fullness: 1961 / 700000 metrics
2023-03-13T14:30:07Z D! [outputs.file] Wrote batch of 7000 metrics in 299.334655ms
2023-03-13T14:30:07Z D! [outputs.file] Buffer fullness: 1961 / 700000 metrics
2023-03-13T14:30:08Z D! [outputs.exec] Wrote batch of 7000 metrics in 3.865330203s
2023-03-13T14:30:08Z D! [outputs.exec] Buffer fullness: 32047 / 700000 metrics
2023-03-13T14:30:10Z D! [outputs.exec] Wrote batch of 7000 metrics in 2.093217596s
2023-03-13T14:30:10Z D! [outputs.exec] Buffer fullness: 25047 / 700000 metrics
2023-03-13T14:31:03Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 686.569375ms
2023-03-13T14:31:03Z D! [outputs.prometheus_client] Buffer fullness: 5979 / 700000 metrics
2023-03-13T14:31:03Z D! [outputs.file] Wrote batch of 7000 metrics in 881.813517ms
2023-03-13T14:31:03Z D! [outputs.file] Buffer fullness: 5979 / 700000 metrics
2023-03-13T14:31:04Z D! [outputs.file] Wrote batch of 7000 metrics in 322.666697ms
2023-03-13T14:31:04Z D! [outputs.file] Buffer fullness: 9316 / 700000 metrics
2023-03-13T14:31:05Z D! [outputs.file] Wrote batch of 7000 metrics in 786.928825ms
2023-03-13T14:31:05Z D! [outputs.file] Buffer fullness: 10794 / 700000 metrics
2023-03-13T14:31:05Z D! [outputs.file] Wrote batch of 7000 metrics in 494.302607ms
2023-03-13T14:31:05Z D! [outputs.file] Buffer fullness: 7762 / 700000 metrics
2023-03-13T14:31:06Z D! [outputs.file] Wrote batch of 7000 metrics in 406.18328ms
2023-03-13T14:31:06Z D! [outputs.file] Buffer fullness: 763 / 700000 metrics
2023-03-13T14:31:06Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 2.097985591s
2023-03-13T14:31:06Z D! [outputs.prometheus_client] Buffer fullness: 21773 / 700000 metrics
2023-03-13T14:31:06Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 224.27852ms
2023-03-13T14:31:06Z D! [outputs.prometheus_client] Buffer fullness: 16869 / 700000 metrics
2023-03-13T14:31:07Z D! [outputs.file] Wrote batch of 7000 metrics in 354.242869ms
2023-03-13T14:31:07Z D! [outputs.file] Buffer fullness: 1039 / 700000 metrics
2023-03-13T14:31:07Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 362.543987ms
2023-03-13T14:31:07Z D! [outputs.prometheus_client] Buffer fullness: 15039 / 700000 metrics
2023-03-13T14:31:13Z E! [agent] Error killing process: os: process already finished
2023-03-13T14:31:57Z D! [outputs.exec] Buffer fullness: 64039 / 700000 metrics
2023-03-13T14:31:57Z E! [agent] Error writing to outputs.exec: [“/etc/telegraf/script/send_teams_deadman_alert.sh”] timed out and was killed
2023-03-13T14:31:57Z D! [outputs.exec] Wrote batch of 7000 metrics in 62.328579ms
2023-03-13T14:31:57Z D! [outputs.exec] Buffer fullness: 57039 / 700000 metrics
2023-03-13T14:32:03Z D! [outputs.prometheus_client] Wrote batch of 7000 metrics in 952.169793ms
2023-03-13T14:32:03Z D! [outputs.prometheus_client] Buffer fullness: 14228 / 700000 metrics


[[outputs.exec]] - alert received - from script being executed

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:41:04 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:41:07 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:41:09 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:41:22 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:41:24 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:41:26 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:42:04 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:42:07 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:42:09 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:43:03 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:43:07 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:43:09 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:44:04 UTC 2023

Alert: DeadManSwitch

-No metrics received from Prometheus-SLO-<< at Mon Mar 13 14:44:06 UTC 2023

@jpowers Can you please help here?
Thank you!

If you look at the logs it you will find these messages:

2023-03-13T14:30:05Z D! [outputs.file] Wrote batch of 7000 metrics in 1.196714755s
2023-03-13T14:30:06Z D! [outputs.file] Wrote batch of 7000 metrics in 301.27396ms
2023-03-13T14:30:07Z D! [outputs.file] Wrote batch of 7000 metrics in 601.597829ms

Telegraf will write metrics if one of two conditions are met:

  1. the flush interval is hit, in your case every 180 second
  2. if the batch size is hit, in your case if 7000 metrics are received

In your case you are hitting condition #2 often and as such flushing data to the outputs. What you want to do is reduce the number of metrics going to that output using metric filters or instead of an output, use the input so it is only triggered on the interval.