Does Telegraph de-duplicate the metrics streamed by HA Prometheus instances?

tejaswiniVadlamudi · October 23, 2023, 4:29pm

If Prometheus is deployed in HA mode i.e. two instances scraping the same targets and streaming the collected data (each Prometheus instance stores its identity info as a label) to the same remote-write receiver of Telegraph, does Telegraph have the filtering capability to de-duplicate the collected data based on the Prometheus instance or identifier name?
Or are these metrics de-duplicated at the Influx-DB end?

jpowers · October 23, 2023, 9:12pm

You can send data to influxdb and if it is identical meaning the same metric name, same tag set, same field set, and same time, then the last entry will win.

What input are you using and are you expecting the URL or the instance where the data was collected from to stay in the data?

We do have the dedup processor as well.

tejaswiniVadlamudi · October 23, 2023, 9:47pm

Something like this:

my_app_http_total_requests{code=“200”,instance=“192.168.1.100:8080”,service_name=“example_service”,method=“GET”,endpoints=“/example/endpoint”, replica=”prometheus-0”} 12345 at October 24, 2021, 10:38:20 AM (UTC)

my_app_http_total_requests{code=“404”,instance=“192.168.1.101:8081”,service_name=“example_service”,method=“POST”,endpoints=“/another/endpoint”, replica=”prometheus -0”} 67890 at October 24, 2021, 10:38:20 AM (UTC)

my_app_http_total_requests{code=“200”,instance=“192.168.1.100:8080”,service_name=“example_service”,method=“GET”,endpoints=“/example/endpoint”, replica=”prometheus-1”} 12346 at October 24, 2021, 10:38:22 AM (UTC)

my_app_http_total_requests{code=“404”,instance=“192.168.1.101:8081”,service_name=“example_service”,method=“POST”,endpoints=“/another/endpoint”, replica=”prometheus-1”} 67899 at October 24, 2021, 10:38:22 AM (UTC)

Two Prometheus instances scrape the same scrape targets and stream the collected metrics on remote-write to Telegraph. As two instances can’t scrape exactly at the same time, there would be a mismatch in data and time-stamps.

Do you recommend preserving the uniqueness of data using replica label or do you think it should be avoided or removed?

jpowers · October 24, 2023, 1:38pm

The data as-is is not duplicate data due to the mismatch timestamp and due to the replica tag. You would need to remove the tag + reset the timestamps, which could get messy.

My personal opinion would be to collect all the data and filter on one of the replicas.

tejaswiniVadlamudi · October 24, 2023, 8:37pm

Thanks @jpowers for the prompt response!

I can remove the replica tag but I’m not sure about the timestamps. Because my HA metric collectors collect metrics parallelly I would like to keep them both & expecting the database to de-duplicate the collection. If we rely on one replica, I fear that we may miss out on some data due to the replica’s outages.

For example, AWS Managed Prometheus can de-dupe this based on a replica label. Please find this article on this subject: Send high-availability data to Amazon Managed Service for Prometheus with Prometheus - Amazon Managed Service for Prometheus.

How does Flux recommend users on metrics collection? Should all metrics be collected by a singleton metric collector?

jpowers · October 24, 2023, 8:51pm

The usage of Flux is a beyond me @Jay_Clifford thoughts on dedup with InfluxDB?

Your data itself is not actually duplicate data because it is coming from different sources at different times. In terms of Telegraf sending data to InfluxDB, recall my first comment about what defines duplicate data:

the same metric name, same tag set, same field set, and same time

As such you could use starlark to rip off the timestamp, but this means your data is not actually accurate I probably would not suggest this route.

tejaswiniVadlamudi · October 25, 2023, 12:35pm

Thanks @jpowers for the confirmation!

But I wonder how Open Telemetry or Prometheus Agent HA deployments integrate with InfluxDB today?!

Topic		Replies	Views
Duplicate data getting pulled from Prometheus influxdb , telegraf , prometheus	3	1616	February 7, 2018
How to pull metrics collected by Prometheus to Telegraph? Telegraf telegraf	1	249	November 14, 2023
Collecting and Storing metrics from collectors deployed in highly available fashion Store influxdb	2	218	October 27, 2023
Implication of Prometheus metric_version 2 on influxdb Telegraf influxdb , telegraf , prometheus	0	1029	August 19, 2020
Downsampling all data exported from prometheus Telegraf influxdb , prometheus , kapacitor	1	1595	December 22, 2020

Does Telegraph de-duplicate the metrics streamed by HA Prometheus instances?

Related topics