A few weeks ago we ran out of disk space on our production server. Due to this lack of disk space, writes from the TSM WAL to shards on disk failed. All data written to the current shard was lost. After this incident, additional disk space was freed and Influx wrote to the next shard normally. However Influx began to record writeErrors in the write measurement in the _internal database. The amount of writeErrors recorded is consistently equal or nearly equal to the amounts recorded in the req and writeOK fields . We have not been able to explain this phenomenon and are interested in understanding it. Several checks for the existence and health of known measurements in the production database have been performed. Data has not been apparently lost or effected. The following Grafana dashboard shows the writeErrror (green) records matching writeOK and req (orange and red) records.
This behavior is not observed in our development database. The write Error value in the write measurement is consistently zero.