InfluxDB Replication error handling

we have an InfluxDB 2 service (v. 2.4.0) running in a docker container, with a remote connection to another InfluxDB OSS instance. We would like to setup an alerting system to receive messages if the replication fails. Is there a built in way or any other working solution to monitor replication streams?

InfluxDB’s own alerting system does not seem to be useful in this case as checks can be setup to evaluate measurements.
We also tried to monitor InfluxDB logs via Prometheus (e.g. with grok_exporter or fluentd_exporter), but these solutions seem to be unstable. And the only error log we see is:
ts=2022-11-28T11:03:54.182970Z lvl=error msg="Error in replication stream" log_id=0eRUKx5l000 service=replications replication_id=0a5b73c0c1da8000 error="invalid response code 422, must be 204" retries=8

Our goal would be to send alert e-mails whenever a replication stream fails. Does anyone have experience with monitoring InfluxDB replications?

does anyone have a working solution to monitor influxdb replications?

Hello @nkecskes,
Thanks for following up. Sometimes questions get buried.
You could use similar logic as described here:

Essentially monitoring the count between source and destination and determine if there are errors that way?
I’m also tagging @Jay_Clifford in case he has any ideas.

1 Like

Hi @Anaisdg, thanks for your answer! We will check this solution.

HI @nkecskes,
You can also monitor the status of replication via the InfluxDB API: InfluxDB v2.6 API documentation