Redis Telegraf Prometheus Error

We are trying to use Telegraf to scrape the Redis /metrics page which is Prom format and having an error. The redis domains that are standalone instances work fine but the ones deployed as a cluster we see
a field for bdb_up twice in the output page which is causing Telegraf to error with :

error Unable to gather {“log_id”: “VXVSsW000”, “service”: “scraper”, “scraper-name”: “Redis”, “error”: “reading text format failed: text format parsing error in line 2546: second TYPE line for metric name "bdb_up", or TYPE reported after samples”}

**Output from /metrics HTTP Exporter page shows 2 bdb_up types as :

HELP bdb_up

TYPE bdb_up gauge

bdb_up{bdb=“4”,cluster=“redis.domain”,status=“active”} 1.0
bdb_up{bdb=“3”,cluster=“redis.domain”,status=“active”} 1.0
bdb_up{bdb=“17”,cluster=“redis.domain”,status=“active”} 1.0
bdb_up{bdb=“8”,cluster=“redis.domain”,status=“active”} 1.0

HELP bdb_up

TYPE bdb_up gauge

bdb_up{bdb=“111”,cluster=“redis.domain”,crdt_guid=“42439090”,crdt_replica_id=“2”,status=“active”} 1.0
bdb_up{bdb=“160”,cluster=“redis.domain”,crdt_guid=“b518589d”,crdt_replica_id=“2”,status=“active”} 1.0
bdb_up{bdb=“129”,cluster=“redis.domain”,crdt_guid=“e35d7e35”,crdt_replica_id=“2”,status=“active”} 1.0

I am looking for any ideas on how to workaround this error. If we use Prometheus to scrape the pages it works and can handle both being there.

I think this is expected as the prometheus spec itself says:

Only one TYPE line may exist for a given metric name.

Telegraf has two different prometehus parsers, which can be toggled with the metric_version = [1|2] config option. Both parsers use the upstream prometheus parser, but produce a different format of metrics. However, given the following metrics file, both will fail with the same error:

# HELP bdb_up
# TYPE bdb_up gauge
bdb_up{bdb="4",cluster="redis.domain",status="active"} 1.0
bdb_up{bdb="3",cluster="redis.domain",status="active"} 1.0
bdb_up{bdb="17",cluster="redis.domain",status="active"} 1.0
bdb_up{bdb="8",cluster="redis.domain",status="active"} 1.0

# HELP bdb_up
# TYPE bdb_up gauge
bdb_up{bdb="111",cluster="redis.domain",crdt_guid="42439090",crdt_replica_id="2",status="active"} 1.0
bdb_up{bdb="160",cluster="redis.domain",crdt_guid="b518589d",crdt_replica_id="2",status="active"} 1.0
bdb_up{bdb="129",cluster="redis.domain",crdt_guid="e35d7e35",crdt_replica_id="2",status="active"} 1.0

The error:

2023-11-27T21:32:02Z E! [inputs.prometheus] Error in plugin: error reading metrics for "http://localhost:8000/metrics2": reading text format failed: text format parsing error in line 9: second TYPE line for metric name "bdb_up", or TYPE reported after samples

Looking around I see a number of issue reports where users file issues with the reporter to not duplicate the type comments:

The first one of those is from the Prometheus project itself.

Edit: this is true of the openmetric standard as well:

There MUST NOT be more than one of each type of metadata line for a MetricFamily. The ordering SHOULD be TYPE, UNIT, HELP.

Hello @TJ_B,
Can you please share your telegraf config?
without your config im guessing a little but to avoid this conflict you could:

  1. add tags to differentiate
    You can add a tag to each metric that indicates its type or source. For instance:
  # ... configuration ...
    source_type = "type1"  # or "type2" for the other metric

The Telegraf input config is a basic Prometheus set to v2. Also for context we setup a demo Prometheus server to test if it works there and it does. They must be ignoring that openmetric standard? We engaged entry support for Redis and they were of no help stating they dont provide that support even though they are bunding the prometheus exporter with the product these days.

All of that said I wasnt sure if there was a workaround using a pre-processor / regex to relabel that metric if it contains the “crdt_guid” field to something like bdb_up_crdt ?

urls = [
metric_version = 2


interval = “60s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 50000
collection_jitter = “0s”
flush_interval = “10s”
flush_jitter = “0s”
precision = “”
logtarget = “file”
logfile = “telegraf_redis.log”
hostname = “”
omit_hostname = false

That is unfortunate, is the repo that is producing this public or on github? Might consider filing an issue if it is.

I wasnt sure if there was a workaround using a pre-processor / regex to relabel that metric if it contains the “crdt_guid” field to something like bdb_up_crdt

Unfortunately, no, any of our processors happen after the input, and the error itself is occurring from prometheus’ library during parsing. They would have to expose something that would let us ignore this part of the spec.