Influxdb Database Replication for High Availability

we are using Telegraf, Influxdb and Grafana for Monitoring our environment. We have two datacenters dc1 and dc2. Each datacenter has one pod of Influxdb running. we want some approach to replicate the data between two influxdb instances running across two datacenters. So, if dc1 goes down we can have the data of both datacenters(dc1 and dc2) in dc2. We are using opensource Influxdb so can anyone please suggest some approaches to achieve this?

Tried to follow Replication during ingest approach where we configure two influxdb urls of both datacenters in telegraf.conf as per this Multiple Data Center Replication with InfluxDB | InfluxData documentation but, what if one of the influxdb is down? and also after it’s recovery both influxdb instances will have different data so, we do not want to follow this approach.

Note:- We are looking for Opensource Influxdb High avaialability approaches only.

1 Like

Can anyone please suggest some approaches for InfluxDB replication with Opensource version.

Hi @harika2724,

Have you seen InfluxDB Replay?

Hi @rawkode Thanks for the reply.

Yes I have checked that, In our environment currently the data flows from Telegraf -> Influxdb -> Grafana. So, I need to change this flow as Telegraf- -> Influxdb Relay -> Influxdb -> Grafana right?

I saw that there are few limitations regarding Influxdb Relay:-

Failure of one Relay or one InfluxDB can be sustained while still taking writes and serving queries. However, the recovery process might require operator intervention. It has also not been updated in 2 years, is not sufficient for long periods of downtime as all data is buffered in RAM, buffered data is lost if a Relay node fails, and when the buffer is full requests are dropped. During prolonged outages the buffer may also negatively impact the health of the Relay instance itself by adding memory pressure. Lastly, a health checker would have to be added to this setup in order to make sure nodes recovering from temporary failures do not respond to queries while the buffer is still being flushed — otherwise only partial data will be delivered, alerts might go off, etc.

Can you please let me know whether this is the only option for Influxdb replication in Opensource version, or do we have any other alternatives to this?

You could also have telegraf write to multiple InfluxDB’s by configuring multiple outputs.

Yes I tried that approach for this, we need to change the telegraf.conf file with two outputs containing the url’s of two influxdb instances of two datacenters. But, our major concern is if one of the Influxdb instance goes down the data will be replicated to only one infuxdb instance and after the recovery of the failed instance the data will be different in both instances for the particular period of time since both the instances will not be in sync. So this is the major concern for not choosing this option. Any suggestions regarding this scenario please ?

Hi @harika2724,

Would you like to speak to a member of our sales team to get information on Influx Enterprise?

Vente Privée is maintaining a fork of the influxb-relay project at https://github.com/vente-privee/influxdb-relay

Another option would be to use a message broker like Kafka. Telegraf > Kafka < Telegraf > InfluxDB.
In which case you’ll trade memory pressure for disk pressure, which will be easier to manage.

1 Like

Thanks @voiprodrigo we are planning to do a POC on Influxdb replication using Influxdb relay for our environment which involves telegraf->Influxdb->grafana. Can anyone please let us know how to configure telegraf to send the data to influxdb relay and from there to influxdb.

Just as you would configure it if pushing directly to InfluxDB. It’s just another hop in the middle.

I think these days you can use Telegraf in place of influxdb-relay too, just configure it with a single influxdb_listener input and multiple influxdb outputs.

Hi @daniel thanks for the response we are trying to implement the workflow that includes telegraf → Influxdb-relay → Influxdb → grafana. Please let me know if this can be achieved.

We are trying this approach because

we are using Telegraf, Influxdb and Grafana for Monitoring our environment. We have two datacenters dc1 and dc2. Each datacenter has one pod of Influxdb running. we want some approach to replicate the data between two influxdb instances running across two datacenters. So, if dc1 goes down we can have the data of both datacenters(dc1 and dc2) in dc2. We are using opensource Influxdb so can anyone please suggest some approaches to achieve this?

Tried to follow Replication during ingest approach of Telegraf where we configure two influxdb urls of both datacenters in telegraf.conf as per this https://www.influxdata.com/blog/multiple-data-center-replication-influxdb/ documentation but, what if one of the influxdb is down? and also after it’s recovery both influxdb instances will have different data so, we do not want to follow this approach.

Please let me know if the above mentioned approach of telegraf to influxdb-relay to Influxdb to Grafana works?

1 Like

hi harika, please let me know if you have got the solution for higher availability of influxdb, grafana and telegraf for monitoring purpose. I am looking for a guidance regarding this.

1 Like

@dhruv395 , you could try telegraf -> 2 rabbitmq/kafka queues -> 2 influxdb -> grafana, with this metics will be queued up if an outage occurs

I can recommend influxdb-srelay:

in conjunction with:

Syncflux, takes care of re-sync operation,and it can be used in daemon mode (it’s called ha-monitor), but I don’t think it’s a mature code yet (I have seen it consuming lot of TCP socket), but it works pretty well, to run a manual resync after one crash.
Let’s imagine you want to resync the last 96 hours only:

syncflux -action fullcopy -start -96h

hi @harika2724

im struggling with the same issue, what did work for you eventually?
i see people suggested a variety

thanks

how to achieve that?
is there any website we can refer?