Hello! I have telegraf running with the tail plugin configured to watch a particular file . My output plugin is influxdb. I’ve noticed that telegraf pushes data as soon as the output buffer is full ( by default 1000
points ) . I’d like to know if there’s any way to rate limit the number of points from the tail plugin which goes to the output plugin . Or even rate limit the output plugin . The use case here is , suppose the file being monitored via tail plugin is pushing around 50,000
lines of logs in say 10
seconds . telegraf would flush all of this out without dropping a single line . This could overwhelm the influxdb server if 1000 such telegraf instances do the same thing . I’d like to know if there’s a way to rate limit this such that if the number of lines of logs is more than X
drop them until the rest is flushed? Thanks!
Hello @prashanthjbabul,
Here is a list of all of the configuration options for the agent:
You could try increasing the interval and decreasing the batch size (and buffer limit? not entirely sure). I’ll share your question with the Telegraf team too to see what wisdom they can share. :))
Hi @Anaisdg , Thanks for your response ! From my understanding , the way telegraf flushes output is either the interval
time or when it reaches the batch/buffer size , whichever is sooner .
I was initially under the impression , that it would always flush at interval
time and will drop if the buffer limit is reached until it flushes everything at interval
time , but that doesnt seem to be the case.
I’m looking for some way in telegraf to be able to enforce the interval
time flush and drop metrics if it exceeds the buffer limit during that time period.
Hello @prashanthjbabu,
I’m not sure, but I’ve asked the Telegraf team to take a look.
Hi, have you checked out the max_undelivered_lines
property of the tail plugin? it’s essentially a buffer for the plugin that will only process up to that ‘amount’ of metric lines before it blocks and waits for there to be space in that buffer to add in more metrics. Eg. if you set it to ‘100’, it would ensure 100 lines max are in route to the output and will only unblock for more metrics after those metrics are delivered by the output.
Thanks @David_Bennett for your response . I just tried it out and it works really well . Thanks for that information . However I was wondering if you know of something generic that can be done at a global level . The reason being there are other plugins like syslog,docker_log which are also log specific and they don’t have a config like max_undelivered_lines
to perform some sort of rate limiting.
To my knowledge, I might not be aware of such a property globally. Generally it’ll be a mix of metric_batch_size
and metric_buffer_limit
to get the right output. Obviously smaller batch size will result in less metrics at a time, and a higher metric buffer limit will hold more metrics while waiting without dropping them. Keep in mind though, that telegraf should realize if there’s a server error upon attempting output, and should take a step back and retry later. Another thing is, you can set the flush_jitter
property, which will help randomize when each telegraf instance spits out metrics, so they don’t all do it at once. This only works if the batch doesn’t fill up first though.
@David_Bennett Thanks for your response . Yes I’ve played with the metric batch_size
and metric_buffer_limit
but as you mentioned This only works if the batch doesn’t fill up first though
. This is the case I’m hitting where the logs generated are so high that it fills up the batch instantly causing an instant flush , therefore not being able to rate limit at all.
Maybe another approach would be not to push so much data into InfluxDB in the first place?
Is this data really all needed?
Otherwise you could already reduce the amount of data in Telegraf to the actually necessary data with a processors or aggregators plugin or with the help of measurement filtering?
@Franky1 Thanks for your response . The data is coming from a log file , which is unpredictable . Sometimes it could generate a steady set of logs but sometimes it could just explode out (which is when i would like to rate limit )
I dont see how measurement filtering would help here , since log files would use the tail
measurement and I would like to keep it at that.
Of course, my approach only makes sense if you can do without some of the data in the log files.
If you really need every line, it won’t work.
But if the log files would contain information that you don’t need in the InfluxDB, you could filter it out before.
@Franky1 I’m also okay to drop certain logs if the buffer is full . If the rate is really high , I’m okay if it drops a few lines but I still would like data to be pushed only every interval
seconds.