Telegraf Configuration Values

Hey!
I need help configuring my telegraf values.I have provided my telegraf files below. Can I reduce my max_undelivered messages in this case. From my Internal Stats, I am getting metrics gathered = 950 metrics/min. and metrics written/min= 1090K .

[[inputs.kafka_consumer]]

brokers = ["${LAV_BROKER_ONE}","${LAV_BROKER_TWO}"]
topics = ["${BROKER_TOPIC}"]
offset = “oldest”
#balance_strategy = “roundrobin”
max_message_len = 1000000
max_undelivered_messages = 500000
consumer_group = “iit_metrics_consumers_new”

data_format = “json”
name_override = “${Measurement}”
interval = “1s”
[inputs.kafka_consumer.tags]
setup = “lav1”

brokers = ["${LAV_BROKER_ONE}","${LAV_BROKER_TWO}"]
topics = ["${BROKER_TOPIC}"]
offset = “oldest”
#balance_strategy = “roundrobin”
max_message_len = 1000000
max_undelivered_messages = 500000
consumer_group = “iit_metrics_consumers_new1”

data_format = “json”
json_string_fields = [“counters_*”]
name_override = “${Measurement}”
interval = “1s”
[inputs.kafka_consumer.tags]
setup = “lav2”

[agent]
metric_batch_size = 8000
metric_buffer_limit = 100000
debug = true
omit_hostname = true

[[outputs.influxdb_v2]]

tagexclude = [“method”,“name”,“host”]
fielddrop = [“value”]
urls = [“XXXX”]
token = “XXXXXX”
organization = “${ORGANIZATION}”
bucket = “${BUCKET}”
flush_interval = “1s”

This is how the telegraf logs look like:

I want to reduce memory consumption and also take care that no buffer overflow occurs.
Thanks

Hi,

Telegraf stores all metrics it keeps in memory. In order to reduce this usage, you will want to send metrics faster and/or store less metrics. There are a couple of different directions you could go:

One option is to push more metrics at each write. This will keep the buffer usage lower assuming your metric collection count stays the same. That would involve increasing the agent’s metric_batch_size. This would send more metrics for each output flush. This is probably a good first place to make a change given you are already setting this value.

Another option is to play with your interval times. Although, it looks like you are already setting both of these to 1 second to constantly send metrics. Be careful as you may start seeing errors if any of the flush or collections take longer than 1 second.

Thanks for replying! Yes, increasing the metric batch size does work.

Also, my Kafka max_undelivered_messages is currently set to 5,00,000 which is much higher compared to default value 1000. Does it mean it consumes this much memory and I should be lowering the value?

Setting any of the values means Telegraf can store up to that much data. In your case, you are keeping the buffer low, so it is safe to say it is not keeping 5 million entries in memory.

1 Like