Issue:
We have requirement to override the default endpoint “https://monitoring.us-east-2.amazonaws.com/” with a custom vpc endpoint “https://vpcendpoint.monitoring.us-east-2.vpce.amazonaws.com” to collect cloudwatch metrics. After trying to enable “endpoint_url”, Telegraf is ignoring the “enpoint_url” and throwing errors.
The same works when we test it via AWS-CLI like below:
aws cloudwatch --endpoint-url "https://vpcendpoint.monitoring.us-east-2.vpce.amazonaws.com list-metrics --namespace AWS/EBS --output text
Per telegraf’s cloudwatch input [documentation]telegraf/README.md at master · influxdata/telegraf · GitHub)
We are using following configuration to collect cloudwatch metrics:
Telegraf Configuation
[global_tags]
[agent]
interval = “10s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “0s”
flush_interval = “10s”
flush_jitter = “0s”
debug = false
quiet = false
logtarget = “file”
logfile = “/var/log/telegraf/telegraf.log”
logfile_rotation_interval = “24h”
logfile_rotation_max_archives = -1
hostname = “”
omit_hostname = false
[[outputs.influxdb]]
urls = [“http://localhost:8086”]
database = “telegraf_PRD”
retention_policy = “”
write_consistency = “any”
[[inputs.cloudwatch]]
region = “us-east-2”
access_key = “xxxxx”
secret_key = “xxxx”
period = “5m”
delay = “5m”
interval = “5m”
namespaces = [“AWS/ElastiCache”]
ratelimit = 25
endpoint_url = “https://vpcendpoint.monitoring.us-east-2.vpce.amazonaws.com”
[[inputs.cloudwatch.metrics]]
names = [“IsMaster”, “CPUUtilization”, “EngineCPUUtilization”, “SwapUsage”, “BytesUsedForCache”, “FreeableMemory”, “NetworkBytesIn”, “NetworkBytesOut”, “ReplicationBytes”, “ReplicationLag”, “CurrConnections”, “NewConnections”, “CurrItems”, “Reclaimed”, “CacheHits”, “CacheMisses”, “Evictions”, “GetTypeCmds”, “SetTypeCmds”]
[[inputs.cloudwatch.metrics.dimensions]]
name = “CacheClusterId”
value = “*”
Telegraf shows following in the log:
2023-02-07T14:54:16Z E! [inputs.cloudwatch] failed to list metrics with namespace AWS/EFS: operation error CloudWatch: ListMetrics, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post “https://monitoring.us-east-2.amazonaws.com/”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023-02-07T14:54:16Z E! [inputs.cloudwatch] failed to list metrics with namespace AWS/EBS: operation error CloudWatch: ListMetrics, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post “https://monitoring.us-east-2.amazonaws.com/”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023-02-07T14:55:18Z E! [inputs.cloudwatch] failed to list metrics with namespace AWS/ElastiCache: operation error CloudWatch: ListMetrics, exceeded maximum number of attempts, 3, https response error StatusCode: 0, RequestID: , request send failed, Post “https://monitoring.us-east-2.amazonaws.com/”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Could this be a bug?