InfluxDB 2.2 replication

Hi!

I have 2 InfluxDB 2.2 instances in a local network, I would like the data written in the local bucket (source) appear in the remote bucket (target), when there is connection between them, as the remote is also on the same intranet network.

The requirement is:

  • to synchronise data written in a local bucket to a remote one
  • only one-way synchronisation, if the remote contains more, it should not affect the local instance
  • when no network connection, it should store the write requests in a buffer and when the connection has stabilised, the data should be sent up

I’ve found that v2.2 has this feature, therefore I set up a test environment, I would like to share my experience with you.
Remote bucket data does not reflect the local one, even the connection was 1 Gb/s and wired. If I disconnected and reconnected the local database from the remote, it also did not replicate the data came in the offline time range.

I couldn’t find information in the documentation for the following:

  • Is the buffer stored in RAM?
  • Should it handle poor network quality like WIFI?
    o Sometimes the remote is unreachable, but when there is connection, it should synchronize the data.

Hi @atjager,
So the feature is relatively new to InfluxDB so your feedback and any data you can provide is most appreciated. To answer a few of your questions:

  1. Data is only delivered on write to the local bucket. at which point the data is held in a durable queue within memory until the write is made. Note that the max batch size is 500kb typically between 250 - 500 lines of line protocol.
  2. You can specify the maximum queue size and maximum age before the data is dropped from the queue. So please also take these settings into account: influx replication create | InfluxDB OSS 2.2 Documentation

@samdillard is our InfluxDB edge product manager. If you can provide any further details about your tests and experiments to improve the feature that would be awesome!

thanks @Jay_Clifford !

@atjager the replication is buffered on-disk so should be durable. What Jay said is right. There are default values that may not have been suitable to your needs so you might want to modify those.

Can you elaborate a little? How much data are you writing to the local bucket and intending to replicate? Do you have any Tasks running that aggregate data in some way?

1 Like

Hello!

Thank you for your answers. I am working on this topic that @atjager started. The environment that I am using atm. consists of two docker containers called local and remote . The local instance is configured for replication to the remote instance with default parameters.

The first thing that I have observed is the following.

  1. On start each write operation is replicated to remote instance without any issue
  2. I have stopped the remote instance, simulating temporary network failure
  3. Multiple write operations made to the local. Obviously those writes not replicated.
  4. Restarted the remote instance and I have expected those writes to be replaced but nothing happened (waited ~ 20min)
  5. Additional writes made to the local but nothing replicated to the remote
  6. After restarting the local instance the writes in 3. available in remote, but further writes are OK

Is this behaviour expected?

Thank you for your reply!

Any update on this thread?

@samdillard
I recently setup replication between two nodes about 1000 miles apart via StarLink. I’ve noticed the same thing. Once you start replication, it seems to work for a short time, then it pauses and fills the buffer. The status is 204 which is supposed to be “all good”.

$ influx replication list
ID                      Name                            Org ID                  Remote ID               Local Bucket ID         Remote Bucket ID        Current Queue Bytes     Max Queue Bytes Latest Status Code      Drop Non-Retryable Data
097422840d81d000        Replication-to-Remote        e4e12ae09e27129d        09741fe14a98b000        bf5bb3c67c65d5ce        f3cfb458aa45f9d8        4755664                 67108860        204                     false

@SkyMoCo @d60 @atjager Thank you all for trying this out and writing to us about the problem! I’ve created an issue to investigate with engineering: Replications not replicating after node remote node failure · Issue #23397 · influxdata/influxdb · GitHub

Please feel free to comment any further details you believe are relevant to help us!

@d60 @atjager This fix is in 2.3. Should be good to go! Thanks for raising.

2 Likes