We’ve been stress testing our architecture, involving data collection from Kafka using Telegraf and writing it to our InfluxDB cloud server. While data transmission to Kafka from 1000 instances occurs seamlessly every 5 seconds, Telegraf’s data dumping to InfluxDB experiences significant delays, even
thou we have retried scaling them from 6 to 30 telegraf agent instances. The time delay progressively increasing from 2 to 5 minutes in regular intervals. Kindly help us to resolve the issue.
Telegraf agent config:
telegraf agent configuration:
[agent]
interval = “5s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “5s”
flush_interval = “10s”
flush_jitter = “5s”
[[inputs.kafka_consumer]]
brokers = {{ broker_url }}
topics = {{ topics }}
sasl_username = “{{ sasl_username}}”
sasl_password = “{{ sasl_password}}”
sasl_mechanism = “PLAIN”
data_format = “json”
json_name_key = “{{measurement_name}}”
tag_keys = [“{{tag_id}}”]
[[outputs.influxdb_v2]]
urls = [“{{ influx_url }}”]
token = “{{ influx_token }}”
organization = “{{ organization }}”
bucket = “{{ bucket }}”
Trends of 2 different tests