Influxdb Database Replication for High Availability

harika2724 · January 31, 2019, 3:08pm

we are using Telegraf, Influxdb and Grafana for Monitoring our environment. We have two datacenters dc1 and dc2. Each datacenter has one pod of Influxdb running. we want some approach to replicate the data between two influxdb instances running across two datacenters. So, if dc1 goes down we can have the data of both datacenters(dc1 and dc2) in dc2. We are using opensource Influxdb so can anyone please suggest some approaches to achieve this?

Tried to follow Replication during ingest approach where we configure two influxdb urls of both datacenters in telegraf.conf as per this Multiple Data Center Replication with InfluxDB | InfluxData documentation but, what if one of the influxdb is down? and also after it’s recovery both influxdb instances will have different data so, we do not want to follow this approach.

Note:- We are looking for Opensource Influxdb High avaialability approaches only.

harika2724 · February 5, 2019, 6:44pm

Can anyone please suggest some approaches for InfluxDB replication with Opensource version.

rawkode · February 5, 2019, 6:49pm

Hi @harika2724,

Have you seen InfluxDB Replay?

harika2724 · February 5, 2019, 6:54pm

Hi @rawkode Thanks for the reply.

Yes I have checked that, In our environment currently the data flows from Telegraf -> Influxdb -> Grafana. So, I need to change this flow as Telegraf- -> Influxdb Relay -> Influxdb -> Grafana right?

I saw that there are few limitations regarding Influxdb Relay:-

Failure of one Relay or one InfluxDB can be sustained while still taking writes and serving queries. However, the recovery process might require operator intervention. It has also not been updated in 2 years, is not sufficient for long periods of downtime as all data is buffered in RAM, buffered data is lost if a Relay node fails, and when the buffer is full requests are dropped. During prolonged outages the buffer may also negatively impact the health of the Relay instance itself by adding memory pressure. Lastly, a health checker would have to be added to this setup in order to make sure nodes recovering from temporary failures do not respond to queries while the buffer is still being flushed — otherwise only partial data will be delivered, alerts might go off, etc.

Can you please let me know whether this is the only option for Influxdb replication in Opensource version, or do we have any other alternatives to this?

rawkode · February 5, 2019, 7:00pm

You could also have telegraf write to multiple InfluxDB’s by configuring multiple outputs.

harika2724 · February 5, 2019, 7:15pm

Yes I tried that approach for this, we need to change the telegraf.conf file with two outputs containing the url’s of two influxdb instances of two datacenters. But, our major concern is if one of the Influxdb instance goes down the data will be replicated to only one infuxdb instance and after the recovery of the failed instance the data will be different in both instances for the particular period of time since both the instances will not be in sync. So this is the major concern for not choosing this option. Any suggestions regarding this scenario please ?

rawkode · February 6, 2019, 1:18pm

Hi @harika2724,

Would you like to speak to a member of our sales team to get information on Influx Enterprise?

hbs · February 6, 2019, 8:29pm

Vente Privée is maintaining a fork of the influxb-relay project at https://github.com/vente-privee/influxdb-relay

voiprodrigo · February 7, 2019, 5:26pm

Another option would be to use a message broker like Kafka. Telegraf > Kafka < Telegraf > InfluxDB.
In which case you’ll trade memory pressure for disk pressure, which will be easier to manage.

harika2724 · February 11, 2019, 10:56pm

Thanks @voiprodrigo we are planning to do a POC on Influxdb replication using Influxdb relay for our environment which involves telegraf->Influxdb->grafana. Can anyone please let us know how to configure telegraf to send the data to influxdb relay and from there to influxdb.

voiprodrigo · February 11, 2019, 11:20pm

Just as you would configure it if pushing directly to InfluxDB. It’s just another hop in the middle.

daniel · February 12, 2019, 12:35am

I think these days you can use Telegraf in place of influxdb-relay too, just configure it with a single influxdb_listener input and multiple influxdb outputs.

harika2724 · February 20, 2019, 6:59pm

Hi @daniel thanks for the response we are trying to implement the workflow that includes telegraf → Influxdb-relay → Influxdb → grafana. Please let me know if this can be achieved.

We are trying this approach because

we are using Telegraf, Influxdb and Grafana for Monitoring our environment. We have two datacenters dc1 and dc2. Each datacenter has one pod of Influxdb running. we want some approach to replicate the data between two influxdb instances running across two datacenters. So, if dc1 goes down we can have the data of both datacenters(dc1 and dc2) in dc2. We are using opensource Influxdb so can anyone please suggest some approaches to achieve this?

Tried to follow Replication during ingest approach of Telegraf where we configure two influxdb urls of both datacenters in telegraf.conf as per this https://www.influxdata.com/blog/multiple-data-center-replication-influxdb/ documentation but, what if one of the influxdb is down? and also after it’s recovery both influxdb instances will have different data so, we do not want to follow this approach.

Please let me know if the above mentioned approach of telegraf to influxdb-relay to Influxdb to Grafana works?

dhruv395 · May 29, 2019, 12:58pm

hi harika, please let me know if you have got the solution for higher availability of influxdb, grafana and telegraf for monitoring purpose. I am looking for a guidance regarding this.

Harrag · July 12, 2019, 5:52am

@dhruv395 , you could try telegraf -> 2 rabbitmq/kafka queues -> 2 influxdb -> grafana, with this metics will be queued up if an outage occurs

maxadamo · September 20, 2019, 10:43am

I can recommend influxdb-srelay:

in conjunction with:

Syncflux, takes care of re-sync operation,and it can be used in daemon mode (it’s called ha-monitor), but I don’t think it’s a mature code yet (I have seen it consuming lot of TCP socket), but it works pretty well, to run a manual resync after one crash.
Let’s imagine you want to resync the last 96 hours only:

syncflux -action fullcopy -start -96h

ahiyaz · May 14, 2020, 1:00pm

hi @harika2724

im struggling with the same issue, what did work for you eventually?
i see people suggested a variety

thanks

naveen_kumar · February 25, 2022, 7:05am

how to achieve that?
is there any website we can refer?

Topic		Replies	Views
Multiple Data Center Replication with InfluxDB	2	2879	January 31, 2019
InfluxData replication	6	4986	June 13, 2017
High Availability InfluxDB 2.0 OSS InfluxDB 2	0	1124	November 27, 2021
Using Influxdb-relay for high availability influxdb , telegraf	9	9460	April 26, 2017
Best practices for failsafe influxdb operation InfluxDB 2 backup	2	77	August 22, 2024

Influxdb Database Replication for High Availability

Related topics