Hi,
I’m using the aws cloudwatch input plugin to collect S3 metrics from CloudWatch with telegraf and then expose them to Prometheus using the prometheus_client output plugin.
I have the same setup successfully being used to get ELB metrics into Prometheus, however, with the S3 metrics the /metrics endpoint is not returning any data.
My telegraf.conf file is:
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = “10s”
## Rounds collection interval to ‘interval’
## ie, if interval=“10s” then always collect on :00, :10, :20, etc.
round_interval = true
## Telegraf will send metrics to outputs in batches of at most
## metric_batch_size metrics.
## This controls the size of writes that Telegraf sends to output plugins.
metric_batch_size = 1000
## For failed writes, telegraf will cache metric_buffer_limit metrics for each
## output, and will flush this buffer on a successful write. Oldest metrics
## are dropped first when this buffer fills.
## This buffer only fills when writes fail to output plugin(s).
metric_buffer_limit = 10000
## Collection jitter is used to jitter the collection by a random amount.
## Each plugin will sleep for a random time within jitter before collecting.
## This can be used to avoid many plugins querying things like sysfs at the
## same time, which can have a measurable effect on the system.
collection_jitter = "0s"
## Default flushing interval for all outputs. Maximum flush_interval will be
## flush_interval + flush_jitter
flush_interval = "10s"
## Jitter the flush interval by a random amount. This is primarily to avoid
## large write spikes for users running a large number of telegraf instances.
## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
flush_jitter = "0s"
## By default or when set to "0s", precision will be set to the same
## timestamp order as the collection interval, with the maximum being 1s.
## ie, when interval = "10s", precision will be "1s"
## when interval = "250ms", precision will be "1ms"
## Precision will NOT be used for service inputs. It is up to each individual
## service input to set the timestamp at the appropriate precision.
## Valid time units are "ns", "us" (or "µs"), "ms", "s".
precision = ""
## Logging configuration:
## Run telegraf with debug log messages.
debug = true
## Run telegraf in quiet mode (error log messages only).
quiet = false
## Specify the log file name. The empty string means to log to stderr.
logfile = ""
## If set to true, do no set the "host" tag in the telegraf agent.
omit_hostname = false
# # Configuration for the Prometheus client to spawn
[[outputs.prometheus_client]]
listen = ":9273"
path = "/metrics"
collectors_exclude = ["gocollector", "process"]
# # Read metrics from AWS CloudWatch
[[inputs.cloudwatch]]
## Amazon Region
region = "us-east-1"
# The minimum period for Cloudwatch metrics is 1 minute (60s). However not all
# metrics are made available to the 1 minute period. Some are collected at
# 3 minute, 5 minute, or larger intervals. See https://aws.amazon.com/cloudwatch/faqs/#monitoring.
# Note that if a period is configured that is smaller than the minimum for a
# particular metric, that metric will not be returned by the Cloudwatch API
# and will not be collected by Telegraf.
#
## Requested CloudWatch aggregation Period (required - must be a multiple of 60s)
period = "24h"
## Collection Delay (required - must account for metrics availability via CloudWatch API)
delay = "24h"
## Recommended: use metric 'interval' that is a multiple of 'period' to avoid
## gaps or overlap in pulled data
interval = "24h"
## Configure the TTL for the internal cache of metrics.
# cache_ttl = "24h"
## Metric Statistic Namespace (required)
namespace = "AWS/S3"
## Metrics to Pull
## Defaults to all Metrics in Namespace if nothing is provided
## Refreshes Namespace available metrics every 1h
[[inputs.cloudwatch.metrics]]
names = ["NumberOfObjects"]
# ## Dimension filters for Metric. All dimensions defined for the metric names
# ## must be specified in order to retrieve the metric statistics.
[[inputs.cloudwatch.metrics.dimensions]]
name = "BucketName"
value = "my-test-bucket"
[[inputs.cloudwatch.metrics.dimensions]]
name = "StorageType"
value = "AllStorageTypes"
# used to convert the some metrics to numbers for Prometheus output
[[processors.converter]]
[processors.converter.fields]
float = ["number_of_objects_sum"]
Telegraf is able to successfully get the metrics from CloudWatch:
$ telegraf --config /etc/telegraf/telegraf.conf --input-filter cloudwatch --test
2019-09-10T08:07:05Z I! Starting Telegraf 1.10.3
> cloudwatch_aws_s3,bucket_name=my-test-bucket,host=4128beab5b75,region=us-east-1,storage_type=AllStorageTypes,unit=count number_of_objects_average=110,number_of_objects_maximum=110,number_of_objects_minimum=110,number_of_objects_sample_count=1,number_of_objects_sum=110 1567930020000000000
Can anyone help me work out what’s going wrong here? Is there something I’ve missed or possibly misconfigured in the configuration file?
Thanks