We’re currently undergoing an uplift of InfluxDB
to 2.7. As a result to load balancing and limitations to basic authorisation
with 2.7, we can no longer use Flux
as the /v2 API will only support a token which is specific to a single server. This has resulted in the below differences;
- Replacement of the
influxdb-relay
with v2.7’sReplication
. - Querying the original (v1) API, rather than /v2
- Query language moved to
InfluxQL
fromFlux
Our applications where going to /v2 using a query as below - intending to get the last record matching the criteria;
|> range(start: -30m)
|> filter(fn: (r) =>
r._measurement == "measurement" and
r.my_tag == "tag_val" and
r.my_other_tag == "another_tag_val"
)
|> group(columns: ["_field"], mode: "by")
|> last()
With our new implementation, we’ve replicated this query using InfluxQL
, also no longer going to the /v2 API & to our new v2.7 Influx servers.
SELECT * FROM "retention_policy"."measurement" WHERE ("my_tag" = 'tag_val' AND "my_other_tag "::tag = 'another_tag_val') ORDER BY time DESC LIMIT 1
In effect, this gives us the same results but intermittently we receive a record with one or more (usually ~50%+) of the field values missing. Reattempting this query straight after gives the record back with all the fields.
We have ran a TCP dump and followed the Writes of the data through to where we read them and when this happens we notice the below, where our query is happening between the replication of the same record;
- REPL HTTP REQ
- QUERY HTTP REQ
- QUERY HTTP RESP
- REPL HTTP RESP
As part of our troubleshooting, we have re-enabled the Influxdb-relay
with v2.7 and turned off the replication
to see if this is the cause of the issue however, seemingly less frequent we are observing the same issue.
We would expect the database to natively provide records which have been fully written/replicated rather than one that’s in the process of being replicated (like a transaction). Does anyone have any suggestions or known issues relating to this?