Telegraf fails to write to InfluxDB 3.0 Serverless due to retention period error

We’re currently trialling Cloud 2 / InfluxDB 3 Serverless to replace an InfluxDB 2.0 OSS instance we’re managing ourselves. We use Telegraf to with the AMQP consumer input and InfluxDB 2.0 output plugins.

Our system uses data buckets with 6 months’ retention but the data source will occasionally generate samples well before the retention period.

With InfluxDB 2.4 OSS we get the following message from Telegraf when it writes a batch with some samples outside the range. Telegraf then moves on to the next batch of points which is fine for our application.
2023-10-20T10:07:49Z E! [outputs.influxdb_v2] Failed to write metric to BUCKET_NAME_REDACTED (will be dropped: 422 Unprocessable Entity): unprocessable entity: failure writing points to database: partial write: points beyond retention policy dropped=3

On InfluxDB 3 Serverless we see the following messages instead. Telegraf then retries the same batch of points over and over again.
2023-10-20T10:07:48Z E! [agent] Error writing to outputs.influxdb_v2: failed to send metrics to any configured server(s)
2023-10-20T10:07:48Z E! [outputs.influxdb_v2] When writing to [https://eu-central-1-1.aws.cloud2.influxdata.com/]: failed to write metric to BUCKET_NAME_REDACTED (403 Forbidden): forbidden: dml handler error: data in table sensor_sample is outside of the retention period: minimum acceptable timestamp is 2023-09-20T10:07:48.930235814+00:00, but observed timestamp 2013-01-01T00:03:38.221+00:00 is older.

The HTTP response code is 422 in the first case and 403 in the second case, and these are handled differently by the Telegraf InfluxDB v2 output plugin on review of the source.

What’s the correct behaviour from InfluxDB? Is there a simple way to make the cloud setup behave more like the OSS setup? (This would be the easiest migration path for us.)

@jpowers have you encountered this?

I have and the difference in behavior is something getting reviewed by the influxdb team currently. It is not clear to me yet what, if anything, telegraf may need to change.

A temporary solution would be to use the client libraries instead of telegraf so you can change how you handle each return code.

Even if we were to change the return code handling either by using the client libraries or by patching telegraf, will InfluxDB Serverless accept and write the other records in the request, or will the whole batch (probably 10,000 records) be rejected? Is this the “something getting reviewed by the influxdb team currently”?

I believe this is the current behavior and something that is looking to get changed in certain situations.

Thanks. I’ll keep a watch out for an update, though I’m not aware of any published release notes for InfluxDB 3.0 / Serverless…