Error writing to outputs.http: 401 Error repeatedly

arnel_lim · March 14, 2024, 12:04am

When telegraf is having trouble connecting to an output URL, we repeatedly see this in the telegraf.log. Is there any way to detect this condition and stop the retry attempts? I’m presuming, in our case where flush_interval is to 1 sec, we are repeatedly making a TCP connection every second and the telegraf queue is slowing filling up. Is there a backoff mechanism in place to limit these attempts, then retry after some set amount of time, rather than retrying every flush_interval? Is there a way to notify an external process when telegraf is in this condition?

2024-03-13T20:32:40Z E! [agent] Error writing to outputs.http: when writing to [https://va2.xyz.com/telemetry/v1/state/platform] received status code: 401. body: {“code”:-3,“message”:null}
2024-03-13T20:32:41Z E! [agent] Error writing to outputs.http: when writing to [https://va2.xyz.com/telemetry/v1/state/platform] received status code: 401. body: {“code”:-3,“message”:null}
2024-03-13T20:32:43Z E! [agent] Error writing to outputs.http: when writing to [https://va2.xyz.com/telemetry/v1/stats/platform] received status code: 401. body: {“code”:-3,“message”:null}

jpowers · March 14, 2024, 1:51pm

So these are not retry attempts, this is your flush_interval of 1 second getting launched by the agent. Meaning every second the agent will tell the http output to try to send metrics.

Is there a backoff mechanism in place to limit these attempts, then retry after some set amount of time, rather than retrying every flush_interval?

Not at this time.

Is there a way to notify an external process when telegraf is in this condition?

If you are not able to send metrics, then your buffer must be growing. There is an internal plugin that you can enable that provides metrics about the output buffers. You can then watch that metric and even alert on it.

arnel_lim · March 18, 2024, 5:58pm

Thanks for the response and thanks for the tip on the internal plugin. I’ll play around with that.

WRT HTTP output, do we only purge the queue when data is sent successfully and we get back a 200 response? I’m wondering what HTTP error responses, if any, would cause data to not get flushed and the queue continue to grow.

jpowers · March 18, 2024, 6:18pm

do we only purge the queue when data is sent successfully

Correct - Sucess if defined as:

Any 2xx return code
A return code that matches a value found in the non_retryable_statuscodes config option. This option is provided to the user to drop metrics if specific return codes are found that you would rather drop the metric, lose the data, and continue on.

Topic		Replies	Views
Error writing to output Telegraf influxdb , telegraf	1	1284	July 10, 2019
Error writing to outputs.influxdb_v2: failed to write metric (401 Unauthorized) InfluxDB 2	2	17684	June 10, 2021
Telegraf recover from - or detect - temporary failure Telegraf telegraf	6	2708	December 28, 2017
[agent] Error writing to outputs.influxdb: could not write any address Telegraf telegraf	3	9555	January 17, 2022
Understanding the Retry Mechanism in Telegraf's output.http Plugin Telegraf telegraf	2	357	July 12, 2024

Error writing to outputs.http: 401 Error repeatedly

Related topics