Hey there,
I set up two identical machines with influxdb. They have 16 Cores, 32GB of RAM and a dedicated 4TB m.2 SSD for influxdb data running on debian 12/bookworm.
Machine A also runs mosquitto as an MQTT broker and a little python script, retrieving messages from the broker and sending them to the influxdb into bucket A.1. The data is a burst of 3*2048 data points every 9 seconds. This works like a charm. A telegraf collects some machine stats into bucket A.2. There is also a grafana instance running on this one with some simple dashboards.
Machine B only runs another influxdb instance and a telegraf. The telegraf collects the same few machine stats as on machine A and sends it directly to bucket A.2. The influxdb instance running on machine B is there for backup purposes. So I created bucket B.1 and set up a replication stream from A.1 to B.1 which basically is working.
The problem is now, that the synchronisation between both influxdb instances is too slow. That means, the replication queue runs full over some time and then looses data until the next chunk of data is transfered from A.1 to B.1. The new freed up space is taken from new data until the queue runs full again. This behaviour repeats endless.
Here you can see the data in A.1 (top), the replicated data in B.1 (middle, already with a lag from slow replication) and the delta between the data sets (bottom).
Machine A and B do have enough ressources and do not encounter any overload situation.
Please help me, what can I do for a faster replication? I already tried setting up the queue size, but this only delays the problem, until the queue is full. I think the synchronisation/replication job just has to send more bytes each round from A.1 to B1. But I do not know how to do this.
Thank you very much!
Kind regards,
Martin