I wanted to compare both telegraf and influxdb logs for the time telegraf logs shows “Timeout exceeded” but could not found anything. What should i do, any additional influxdb configurations will work? How can i know how much load is getting on influxdb? Can anyone help me here?
Generally, the context deadline exceeded
error means that something network-related occurred at the time. Other than the InfluxDB instance going down I would not expect InfluxDB to have any logs that would be interesting.
Some questions to consider:
- Are telegraf and the influxdb server on the same host? If not, how far apart are they? Same region? Different clouds? Across continents?
- How often are you seeing this error? Once seen is it continuous? Do you ever recover?
- The
r5a.4xlarge
is a pretty big system, what else is running in it? Is the system where this is being run under heavy load? Have you watched the load? - Do you have any firewall changes or DNS changes at the time?
Hi, so we had modified our pipeline to avoid these errors. So, currently we have two separate ec2 instance - for telegraf it is t3.small and influxdb is r5a.xlarge. Also, we had to do some processing on Telegraf side, which was not optimized earlier, we worked on that and got the optimized solution. Currently, I am having two telegraf docker containers - one is supporting 6 Kafka topics and one 25/26. The second one, gives a deadline exceeded error after some random time. And after restarting the containers, it works fine until the next time. For 1st one, I am yet to see any issue like that. Due to more topics it is behaving? Any way to find out the supported load. Data frequency is every 5 mins and processor part is still quite heavy.
I see you started a new thread and I answered over there.