Telegraf statsd thread control

Arnold · December 21, 2022, 3:12pm

Hey guys,
we are using telegraf with mostly the statsd plugin, all of our applicative metrics are being sent over statsd.
recently we encountered that only 2 cores are active, even if we are using machines with 24CPUs or more.

we saw this limitation is somewhere in the code of the statsd plugin.
does someone know anything about this?

it appears that it once was configurable but now its not:

it looks like it was once possible to configure max_parser_threads but its now reverted

there is also another PR:

which is a draft but also relates to the same point this time with number_workers_threads param.

can you please elaborate on all of those changes? we just need to be able to utilize statsd to the fullest on big machines.

(Also - we have opened an issue on github as well: #12432 (can’t post another link as i am a new user)

thanks a lot

Anaisdg · December 21, 2022, 8:43pm

Hello @Arnold,
I don’t I’m sorry.
Im tagging @Jay_Clifford who might know something but he’s also off so i appreciate your patience in advance.

Jay_Clifford · January 6, 2023, 11:02am

Hi @Arnold,
Just tagging @jpowers here. It looks like a design choice made by one of our previous Telegraf developers. Hopefully, Josh could shine some light if this is something we wouldn’t mind adding back in. I belive the decision was taken over performance initially.

jpowers · January 9, 2023, 11:58pm

Hi,

recently we encountered that only 2 cores are active, even if we are using machines with 24CPUs or more.

Recall that a goroutine, which those settings in PRs you mentioned control, is not the same as thread. Befor diving to far into these, are you finding issues where Telegraf is not actually able to keep up and is dropping metrics?

Arnold · January 10, 2023, 8:07am

Hey @jpowers

Yes, Telegraf is not able to keep up and is dropping a lot of metrics.
When looking at Pod utiliziation only 2 cores are being used, a teamember in the past was able to pinpoint exactly where this is hardcoded but now i can’t find it in telegraf repo
(also might be in a dependecy telegraf uses for statsd for example)

so we are in a situation that we can’t scale telegraf as it will always utilize 2 cores (we are talking about statsd input only)

jpowers · January 10, 2023, 1:53pm

As you haven’t shared a config or log it is a little hard to jump to what the solution will be:

How many messages are you parsing per interval?
What is your batch size set to?
What is your metric buffer set to?
Have you tried the artifact in #12318 and found that is did in fact resolve the issue?

Topic		Replies	Views
Telegraf ( statsD ) input not writting to InfluxDB Telegraf	0	706	April 5, 2017
Multiple Telegraf Pods Telegraf telegraf	9	1186	January 22, 2020
Probléme statds telegraf influxdb Telegraf	1	213	August 7, 2023
Telegraf data is being sent to multiple bucket Telegraf telegraf	4	1811	January 11, 2022
[SOLVED] Telegraf + StatsD input - cannot open 8125 port Telegraf telegraf	4	5440	March 12, 2021

Telegraf statsd thread control

Related topics