InfluxDB 2.7 discards the whole batch of metrics in case of a serialization issue in a single metric

alexkonkin · August 13, 2023, 8:43am

Recently we did migration from the InfluxDB 1.8 to the InfluxDB 2.7 and it seems that we discovered a kind of error that we have never seen in the v.1.8

After some investigation we identified that in case of error in one metric a whole batch of metrics that should be written at this moment is canselled. This is usually indicated as a gap in the Grafana chart.

About our environment:
-Kafka topic which contains metrics
-Logstash pipeline that reads them from the topic and writes to the InfluxDB 2.7

Current solution that I found: Identify the error with incorrect metric (for example attempt to write integer to the field with type ofboolean)

[app_influxdb2_metrics] Non recoverable exception while writing to InfluxDB 
{:exception=>#<InfluxDB::Error: {"code":"invalid","message":"unable to parse 'metrics.counters,container_name=container-with-error-metric,level=metrics.counter,datacenter=test cacheMiss={\"cacheKeyPrefix\"=\u003e\"container-with-error-metric::message-hash::\", \"cacheStore\"=\u003e\"redis\"},timestamp=\"2023-08-13T08:09:06.791Z\" 1691914146791': invalid boolean

and drop it on a Logstash pipeline level.

if [container_name] == "container-with-error-metric" {
    drop {
    }
}

I also found a similar bug related to telegrap agent

https://github.com/influxdata/telegraf/issues/5858

I probably suppose that the solution might be related to LogStash pipeline (in case of the single error it might cancel the set of records that it is going to send to InfluxDB) and solution could be related to the logstash part, I need to investigate this

Is there any adjustment in InfluxDB 2.7 which would allow to ignore or abort a single metric and allow all other metrics that should be written at the same time to be recorded to InfluxDB 2.7 database?

Thanks.

Anaisdg · August 14, 2023, 4:58pm

Hello @alexkonkin,
I don’t believe that there is that type of solution in 2.x.
I would actually wait on migrating to 2.x.
OSS 3.x is coming out later this year and supports influxQL and many features in 2.x aren’t on the roadmap for 3.x currently. 3.x also has significant performance enhancements.

alexkonkin · August 15, 2023, 6:42am

Sorry for asking this question again.

Is it correct that such feature is not present in the InfluxDB 2.x

and we need to wait and migrate to OSS 3.x as soon as it is released?
I suppose that OSS 3.x is not a cloud based solution that we can install in our
in house environment.

Thank you.

Topic		Replies	Views
Outputs.influxdb_v2 plugin Error: 422 Unprocessable Entity - could be due to influxdb uptime? Telegraf	3	1354	April 18, 2024
InfluxDB 2 stop working , influxd: Error: invalid series segment InfluxDB 2 influxdb	1	851	August 11, 2021
Empty measurement after metrics sending influxdb , telegraf	2	363	March 21, 2024
Failed to send metrics on some buckets after disk full InfluxDB 2 influxdb , telegraf	6	1317	May 31, 2022
E! [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request): partial write: points beyond retention policy dropped=1 InfluxDB 1	8	2207	January 26, 2023

InfluxDB 2.7 discards the whole batch of metrics in case of a serialization issue in a single metric

Related topics