Telegraf mqtt_consumer, how to increase consumption speed

cyril.jean · March 16, 2023, 8:42am

Hello,

I’m using telegraf to consume messages from MQTT server. I have mainly QoS2 messages, and more than 1000 metrics per seconds.
I have an issue that I found telegraf is not consumtion the messages fast enough and they tend to accumulate in the broker, where I often have 100 thousands of message waiting for ack.

Right now my telegraf is setup like that:

metric_batch_size: 10000
flush_interval: 30s
interval: 10s
metric_buffer_limit: 50000

[mqtt_consumer]
max_undelivered_messages= 5000

jpowers · March 17, 2023, 1:14pm

MQTT Consumer, consumes messages as they are available + as it has room based on the max_undelivered_messages size. The interval option does not apply to this plugin.

As such, what your config is saying is, read up to 5000 messages total. Then attempt to write 10,0000 messags every 30 seconds. Which means that MQTT consumer will effectively only ever read 5,000 messages every 30 seconds.

cyril.jean · March 17, 2023, 3:23pm

Thank you Josh for your answer. Are you saying I should match max_undelivered_messages with metric_batch_size ?

jpowers · March 17, 2023, 4:06pm

It depends on what your goal is

If you have no other input plugins and only want to read more metrics, then yes increasing the max_undelivered_messages option to fill the metric batch size would read more metrics. If you want to read metrics faster another option is to increase the flush_interval as well.

cyril.jean · March 17, 2023, 4:21pm

Yes MQTT is the main source of metrics here.
Well that’s really interesting. With the configuration described above, I was stuck at 166 metric/s in gathered and written, which is exactly 5000metrics/30seconds, and some MQTT QoS0 were just lost.
write_buffer size was also stuck to 4.88kB.
Now I’ve push to 10k for max_undelivered_messages I have a written metrics/s which fluctuate between 260 and 280/s, which makes more sense.
Also, the write_buffer size is now around 200B. This one I cannot explain !

But it means I will have to increase max_undelivered_messages together with increasing of message rate received I believe.

srebhan · March 21, 2023, 9:33am

@cyril.jean Telegraf will collect messages until either flush_interval (±jitter) is reached or metric_batch_size number of metrics arrived. So in your case, you will receive 5k messages (due to your max_undelivered_messages setting and then Telegraf waits for the 30 seconds (flush_interval) to pass by.
So in your case, you are filling in the 5k messages in the first 5 seconds and then wait 25 seconds to flush the metrics as the metric_batch_size is never full.

As a solution I would increase your max_undelivered_messages to say twice the metric_batch_size. Make sure that metric_buffer_limit is still greater than the batch size by margin (say e.g. factor 2 or more). You can additionally reduce the flush_interval to control the maximum latency for your metrics if the rate drops for some reason.

Topic		Replies	Views
How to increase telegraf write batch size? Telegraf telegraf	4	1218	May 17, 2023
MQTT Telegraf connected but no Data influxdb , telegraf	6	2709	December 19, 2020
Metric_buffer_limit can't exceed 1000? Telegraf telegraf , performance	5	1803	September 21, 2021
Some data not written by telegraf into influxDB after telegraf restarted	5	1307	November 14, 2019
Telegraf input mqtt Frequent chain break and reconnection Telegraf	3	853	January 4, 2022

Telegraf mqtt_consumer, how to increase consumption speed

Related topics