have a telegraf with a external script to grab potential aws ec2 instance termination
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
logfile = "/var/log//telegraf/telegraf.log"
quiet = false
hostname = "backuppc-a01"
omit_hostname = false
[[outputs.prometheus_client]]
listen = ":9009"
[[inputs.exec]]
# get termination time
commands = [ "/usr/local/sbin/aws-termination.sh" ]
data_format = "influx"
timeout = "15s"
$ /usr/local/sbin/aws-termination.sh
aws_instance_termination,action=instance-stop,host=backuppc-a01 seconds=1192060i
yet curl tp telegraf output this:
aws_instance_termination_seconds{action="instance-stop",host="backuppc-a01",id="backuppc-a01",region="eu-central-1",type="t3.micro",zone="eu-central-1a"} 1.19206e+06
Notice the integer to float, something with a seconds precision gets rounded to several minute with the same value
This outputs in prometheus a step that breaks alerting
Notice that the graph is always this, if i wait 10min, i still get the same graph, where was flat turns back to the same angled line
I tried to change the script output to integer, uinteger, float and i’m unable to fix this.
Any hint how to fix or workaround this? right now the only way i can see is to increase the prometheus alert range to outside the step, so it can always change
thanks for the help