Context Deadline Error Telegraf

Pratik_Das_Baghel · February 14, 2023, 1:08pm

Hi, I am running Telegraf & InfluxDB as docker containers deployed over two separate ec2 instance in the same subnet, the type of instance is r5.large and r5a.4xlarge respectively. The input which I am using is Kafka, the frequency of data is every 5 minutes. I am running two Telegraf containers for two different Kafka Brokers. For one Telegraf, I am using 7 kafka topics and for others its 25/26 topics.

For the second Telegraf, I am getting a “Context Deadline exceeded error”, however after restarting every thing works fine until the next time. I want to know, what might be the issue and how to debug that. I am using http timeout as “15s”. I will be shifting the topics to another telegraf, however before that, I would wanted to see some logs on the basis of which “Context Deadline” message is coming.

On Telegraf - we are using quite heavy processing but the code itself is most optimized according to our use case. Below are the processor plugin configurations -

Starlark processor - Extract 3 different metrics from each metric
Convertor - Converting data type of field from string to float
Adding some new tags based on regular expression.

I am using jitter as well to make sure both Telegraf’s are not pushing data to influxDB at same time.

jpowers · February 14, 2023, 2:09pm

Context Deadline exceeded error

This is generally due to a connection to your input having issues. It could be DNS, it could be network blips, it could be disconnects, a proxy, etc.

what might be the issue and how to debug

Telegraf’s handling of Kafka connections is all done by sarama. There is no way for us to give insight without the sarama logs. You need to run telegraf with --debug enabled and see what sarama is reporting.

Pratik_Das_Baghel · February 14, 2023, 3:42pm

This is the Telegraf Log -

2023-02-13T06:14:00Z E! [outputs.influxdb_v2] When writing to [http://influxdb_url:8086/]: Post "http://influxdb_url:8086/api/v2/write?bucket=bucket_name&org=org_name": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
2023-02-13T06:14:00Z D! [outputs.influxdb_v2] Buffer fullness: 5755 / 300000 metrics
2023-02-13T06:14:00Z E! [agent] Error writing to outputs.influxdb_v2: failed to send metrics to any configured server(s)

So, until I did not restarted the container, we kept seeing the same error. The duration was for about 24 hours. After restarting container, everything is working fine since last 12 hours now.

jpowers · February 14, 2023, 4:13pm

Ah from influxdb output not kafka. My comment above, about this usually being related to networking issues still stands though. I am surprised to see this if they are on the same subnet, are they in the same availability zone?

Pratik_Das_Baghel · February 14, 2023, 5:39pm

Yes, same availability zone. For InfluxDB, we are running some 100-110 tasks which are separated by 30s/10s offset, in total taking 40-45 to calculate, apart from that load on influxdb is not much.

jpowers · February 14, 2023, 5:55pm

What do your InfluxDB logs show?

Pratik_Das_Baghel · February 14, 2023, 6:17pm

Telegraf Log -

InfluxDB logs -

jpowers · February 14, 2023, 7:29pm

Do you track the load of the telegraf system? That system you are using only has 2-3 GB of memory right?

Every time I have seen this issue it has come down to:

Networking issues, whether they are DNS, firewall, a network blip, and/or proxy causing issues
Load related on the telegraf system

Pratik_Das_Baghel · February 15, 2023, 4:31am

No, actually the Ec2 instance we are using for Telegraf is r5.large, the specs are 2 CPU 16 GB Ram. What should I check for both the points? I wanted similar solution as mentioned here - Azure output plugin - properly recover after context deadline exceeded · Issue #10950 · influxdata/telegraf · GitHub. Can you help?

jpowers · February 15, 2023, 1:41pm

File an issue and we can look at making that change

Pratik_Das_Baghel · February 15, 2023, 4:32pm

Done - InfluxDB output plugin - recover after Context Deadline · Issue #12685 · influxdata/telegraf · GitHub. Thanks, also I seeing the Internal metrics going through out the timeline while, the metrics from Kafka not going to InfluxDB. What could be the reason for that?

Topic		Replies	Views
Client.Timeout exceeded while awaiting headers Telegraf influxdb	3	4789	February 14, 2023
Context deadline exceeded (Client.Timeout exceeded while awaiting headers) in Influxdb 2.0 InfluxDB 2 influxdb , telegraf	15	20622	January 9, 2024
Telegraf Fails to send Data to Influxdb Telegraf	11	1603	December 19, 2021
Running Telegraf using Docker Compose with error writing to outputs.influxdb_v2 Telegraf influxdb , telegraf , docker	5	3438	April 13, 2021
Telegraf inputs.http: i/o timeout and context deadline exceeded (Client.Timeout exceeded while awaiting headers) Telegraf telegraf	10	3551	December 11, 2023

Context Deadline Error Telegraf

Related topics