Telegraf - cloudwatch input, prometheus_client no metrics

dharvey · September 10, 2019, 8:08am

Hi,

I’m using the aws cloudwatch input plugin to collect S3 metrics from CloudWatch with telegraf and then expose them to Prometheus using the prometheus_client output plugin.

I have the same setup successfully being used to get ELB metrics into Prometheus, however, with the S3 metrics the /metrics endpoint is not returning any data.

My telegraf.conf file is:
# Configuration for telegraf agent
[agent]
## Default data collection interval for all inputs
interval = “10s”
## Rounds collection interval to ‘interval’
## ie, if interval=“10s” then always collect on :00, :10, :20, etc.
round_interval = true

  ## Telegraf will send metrics to outputs in batches of at most
  ## metric_batch_size metrics.
  ## This controls the size of writes that Telegraf sends to output plugins.
  metric_batch_size = 1000

  ## For failed writes, telegraf will cache metric_buffer_limit metrics for each
  ## output, and will flush this buffer on a successful write. Oldest metrics
  ## are dropped first when this buffer fills.
  ## This buffer only fills when writes fail to output plugin(s).
  metric_buffer_limit = 10000

  ## Collection jitter is used to jitter the collection by a random amount.
  ## Each plugin will sleep for a random time within jitter before collecting.
  ## This can be used to avoid many plugins querying things like sysfs at the
  ## same time, which can have a measurable effect on the system.
  collection_jitter = "0s"

  ## Default flushing interval for all outputs. Maximum flush_interval will be
  ## flush_interval + flush_jitter
  flush_interval = "10s"
  ## Jitter the flush interval by a random amount. This is primarily to avoid
  ## large write spikes for users running a large number of telegraf instances.
  ## ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s
  flush_jitter = "0s"

  ## By default or when set to "0s", precision will be set to the same
  ## timestamp order as the collection interval, with the maximum being 1s.
  ##   ie, when interval = "10s", precision will be "1s"
  ##       when interval = "250ms", precision will be "1ms"
  ## Precision will NOT be used for service inputs. It is up to each individual
  ## service input to set the timestamp at the appropriate precision.
  ## Valid time units are "ns", "us" (or "µs"), "ms", "s".
  precision = ""

  ## Logging configuration:
  ## Run telegraf with debug log messages.
  debug = true
  ## Run telegraf in quiet mode (error log messages only).
  quiet = false
  ## Specify the log file name. The empty string means to log to stderr.
  logfile = ""

  ## If set to true, do no set the "host" tag in the telegraf agent.
  omit_hostname = false


# # Configuration for the Prometheus client to spawn
[[outputs.prometheus_client]]
  listen = ":9273"

  path = "/metrics"

  collectors_exclude = ["gocollector", "process"]

# # Read metrics from AWS CloudWatch
[[inputs.cloudwatch]]
  ## Amazon Region
  region = "us-east-1"

  # The minimum period for Cloudwatch metrics is 1 minute (60s). However not all
  # metrics are made available to the 1 minute period. Some are collected at
  # 3 minute, 5 minute, or larger intervals. See https://aws.amazon.com/cloudwatch/faqs/#monitoring.
  # Note that if a period is configured that is smaller than the minimum for a
  # particular metric, that metric will not be returned by the Cloudwatch API
  # and will not be collected by Telegraf.
  #
  ## Requested CloudWatch aggregation Period (required - must be a multiple of 60s)
  period = "24h"

  ## Collection Delay (required - must account for metrics availability via CloudWatch API)
  delay = "24h"

  ## Recommended: use metric 'interval' that is a multiple of 'period' to avoid
  ## gaps or overlap in pulled data
  interval = "24h"

  ## Configure the TTL for the internal cache of metrics.
  # cache_ttl = "24h"

  ## Metric Statistic Namespace (required)
  namespace = "AWS/S3"

  ## Metrics to Pull
  ## Defaults to all Metrics in Namespace if nothing is provided
  ## Refreshes Namespace available metrics every 1h
  [[inputs.cloudwatch.metrics]]
    names = ["NumberOfObjects"]

  #  ## Dimension filters for Metric.  All dimensions defined for the metric names
  #  ## must be specified in order to retrieve the metric statistics.
    [[inputs.cloudwatch.metrics.dimensions]]
      name = "BucketName"
      value = "my-test-bucket"

    [[inputs.cloudwatch.metrics.dimensions]]
      name = "StorageType"
      value = "AllStorageTypes"

# used to convert the some metrics to numbers for Prometheus output
[[processors.converter]]

  [processors.converter.fields]
    float = ["number_of_objects_sum"]

Telegraf is able to successfully get the metrics from CloudWatch:

$ telegraf --config /etc/telegraf/telegraf.conf --input-filter cloudwatch --test
2019-09-10T08:07:05Z I! Starting Telegraf 1.10.3
> cloudwatch_aws_s3,bucket_name=my-test-bucket,host=4128beab5b75,region=us-east-1,storage_type=AllStorageTypes,unit=count number_of_objects_average=110,number_of_objects_maximum=110,number_of_objects_minimum=110,number_of_objects_sample_count=1,number_of_objects_sum=110 1567930020000000000

Can anyone help me work out what’s going wrong here? Is there something I’ve missed or possibly misconfigured in the configuration file?

Thanks

Topic		Replies	Views
Input plugin for AWS S3 Telegraf telegraf	0	1309	December 17, 2020
Telegraf Agent - AWS Services Metrics Telegraf telegraf	2	549	September 14, 2022
Telegraf cloudwatch input plugin seems to be recording incorrect timestamps in influx Telegraf telegraf	3	1088	April 26, 2018
Telegraf not using endpoint_url for cloudwatch Telegraf telegraf	18	599	February 8, 2023
AWS Cloudwatch Telegraf Telegraf telegraf	12	1762	May 3, 2019

Telegraf - cloudwatch input, prometheus_client no metrics

Related topics