Recently we did migration from the InfluxDB 1.8 to the InfluxDB 2.7 and it seems that we discovered a kind of error that we have never seen in the v.1.8
After some investigation we identified that in case of error in one metric a whole batch of metrics that should be written at this moment is canselled. This is usually indicated as a gap in the Grafana chart.
About our environment:
-Kafka topic which contains metrics
-Logstash pipeline that reads them from the topic and writes to the InfluxDB 2.7
Current solution that I found: Identify the error with incorrect metric (for example attempt to write integer to the field with type ofboolean)
[app_influxdb2_metrics] Non recoverable exception while writing to InfluxDB
{:exception=>#<InfluxDB::Error: {"code":"invalid","message":"unable to parse 'metrics.counters,container_name=container-with-error-metric,level=metrics.counter,datacenter=test cacheMiss={\"cacheKeyPrefix\"=\u003e\"container-with-error-metric::message-hash::\", \"cacheStore\"=\u003e\"redis\"},timestamp=\"2023-08-13T08:09:06.791Z\" 1691914146791': invalid boolean
and drop it on a Logstash pipeline level.
if [container_name] == "container-with-error-metric" {
drop {
}
}
I also found a similar bug related to telegrap agent
https://github.com/influxdata/telegraf/issues/5858
I probably suppose that the solution might be related to LogStash pipeline (in case of the single error it might cancel the set of records that it is going to send to InfluxDB) and solution could be related to the logstash part, I need to investigate this
Is there any adjustment in InfluxDB 2.7 which would allow to ignore or abort a single metric and allow all other metrics that should be written at the same time to be recorded to InfluxDB 2.7 database?
Thanks.