We are having some replication issues, some LP are not been replicated (influxdb version 2.7). But we do not see any error in the influxdb log.
This an example:
Node 1
,1,2023-05-30T07:00:00Z,100,value,XXX,YYY,AAA,NNN,BBB,CCC
,1,2023-05-30T08:00:00Z,100,value,XXX,YYY,AAA,NNN,BBB,CCC
,1,2023-05-30T10:00:00Z,100,value,XXX,YYY,AAA,NNN,BBB,CCC
,1,2023-05-30T11:00:00Z,100,value,XXX,YYY,AAA,NNN,BBB,CCC
Node 2
,1,2023-05-30T07:00:00Z,100,value,XXX,YYY,AAA,NNN,BBB,CCC
,1,2023-05-30T08:00:00Z,100,value,XXX,YYY,AAA,NNN,BBB,CCC
,1,2023-05-30T11:00:00Z,100,value,XXX,YYY,AAA,NNN,BBB,CCC
We have the replication configuration with --no-drop-non-retryable-data. Is there a way to see the information in the queue to try to investigate what is happening?
We are using the “influx replication list” command to monitor the replication, but no error has been detected ( e.g: “replications”: [{“id”: “0b45fcb1083f4000”,“orgID”: “f1c1a59ddf65a579”, “name”: “my_replication_stream”, “remoteID”: “0b45fcaff4d37000”, “localBucketID”: “872d031ba055152b”, “remoteBucketID”: null, “RemoteBucketName”: “our_metrics”,“maxQueueSizeBytes”: 10737418240,“currentQueueSizeBytes”: 10205395,“remainingBytesToBeSynced”: 0, “latestResponseCode”: 204, “latestErrorMessage”: “”, “dropNonRetryableData”: false,“maxAgeSeconds”: 604800
Is there another way to monitor the queue?
Also in this ticket Add field for dropping "non-retryable" errors to replication streams · Issue #22880 · influxdata/influxdb · GitHub we have read: “t is possible that remote write errors will be encountered for which retrying does not provide any hope of success, such as a 400 error which means that the LP data could not be parsed” How it is possible a LP is parsed by the node 1 and not by the node 2.