Multiple Telegraf Pods

jnylam · December 3, 2019, 1:21am

Hi, I’m running a telegraf (v1.9.1) pod alongside an application in kubernetes. We’re sending a lot of metrics and running into a statsd message queue full error:

2019-12-02T00:57:24Z E! Error: statsd message queue full. We have dropped 901920000 messages so far. You may want to increase allowed_pending_messages in the config

We’ve increased the allowed_pending_messages to about 80000 but the problem is still persisting. We’re only running 1 telegraf pod because we thought horizontally scaling it would affect the statsd aggregation. Would it be possible to run multiple telegraf pods or would that cause the statsd aggregate measurements to be inaccurate?

daniel · December 4, 2019, 9:14pm

This would affect the aggregated data if you split the same metrics across the two Telegraf, but if you shard the output consistently you could send to multiple Telegraf.

I’m curious about how many statsd metrics you are sending, do you know your approximate rate?

jnylam · December 6, 2019, 10:53pm

I added telegraf internal and it says metrics gathered for statsd is about 130units/sec

daniel · December 7, 2019, 1:18am

That doesn’t seem very high, but it occurs to me now that these metrics can be made up of any number of statsd metrics due to the aggregation, so I guess it isn’t very helpful in the current form. I could add some additional counters for the plugin though, would you be able to test a development build?

jnylam · December 9, 2019, 6:55pm

Sure, I could test this.

daniel · December 10, 2019, 1:09am

I opened an issue on GitHub (#6779), right now I’m very busy finalizing the 1.13.0 release but I’ll work on this later this week and will add links for testing on the issue.

jnylam · December 16, 2019, 6:32pm

thanks, I was wondering, our agent interval was at 60s, would there have been less pending messages if the interval was at 10s?

daniel · December 17, 2019, 4:26am

I wouldn’t expect it to make much of a difference.

daniel · January 17, 2020, 8:47pm

Sorry about the delay, I added some build links to telegraf #6921 and some queries that I’m interested in to telegraf #6919.

jnylam · January 22, 2020, 10:44pm

thanks for the update

Topic		Replies	Views
Monitoring applications running in kubernetes containers	1	583	July 19, 2018
Telegraf statsd thread control Telegraf	5	900	January 10, 2023
Telegraf - counters are not incrementing if multiple requests are sent within the same agent interval Telegraf telegraf	3	1516	August 4, 2017
Telegraf to collect StatsD Telegraf telegraf	0	450	January 22, 2020
Telegraf performance use statsd input plugin & kafka output plugin Telegraf	1	2012	May 19, 2017

Multiple Telegraf Pods

Related topics