Telegraf statsd thread control

Hey guys,
we are using telegraf with mostly the statsd plugin, all of our applicative metrics are being sent over statsd.
recently we encountered that only 2 cores are active, even if we are using machines with 24CPUs or more.

we saw this limitation is somewhere in the code of the statsd plugin.
does someone know anything about this?

it appears that it once was configurable but now its not:

it looks like it was once possible to configure max_parser_threads but its now reverted

there is also another PR:

which is a draft but also relates to the same point this time with number_workers_threads param.

can you please elaborate on all of those changes? we just need to be able to utilize statsd to the fullest on big machines.

(Also - we have opened an issue on github as well: #12432 (can’t post another link as i am a new user)

thanks a lot

Hello @Arnold,
I don’t I’m sorry.
Im tagging @Jay_Clifford who might know something but he’s also off so i appreciate your patience in advance.

Hi @Arnold,
Just tagging @jpowers here. It looks like a design choice made by one of our previous Telegraf developers. Hopefully, Josh could shine some light if this is something we wouldn’t mind adding back in. I belive the decision was taken over performance initially.

Hi,

recently we encountered that only 2 cores are active, even if we are using machines with 24CPUs or more.

Recall that a goroutine, which those settings in PRs you mentioned control, is not the same as thread. Befor diving to far into these, are you finding issues where Telegraf is not actually able to keep up and is dropping metrics?

Hey @jpowers

Yes, Telegraf is not able to keep up and is dropping a lot of metrics.
When looking at Pod utiliziation only 2 cores are being used, a teamember in the past was able to pinpoint exactly where this is hardcoded but now i can’t find it in telegraf repo
(also might be in a dependecy telegraf uses for statsd for example)

so we are in a situation that we can’t scale telegraf as it will always utilize 2 cores (we are talking about statsd input only)

As you haven’t shared a config or log it is a little hard to jump to what the solution will be:

  • How many messages are you parsing per interval?
  • What is your batch size set to?
  • What is your metric buffer set to?
  • Have you tried the artifact in #12318 and found that is did in fact resolve the issue?