Rate limiting telegraf tail plugin log files

prashanthjbabu · March 23, 2021, 8:12am

Hello! I have telegraf running with the tail plugin configured to watch a particular file . My output plugin is influxdb. I’ve noticed that telegraf pushes data as soon as the output buffer is full ( by default 1000 points ) . I’d like to know if there’s any way to rate limit the number of points from the tail plugin which goes to the output plugin . Or even rate limit the output plugin . The use case here is , suppose the file being monitored via tail plugin is pushing around 50,000 lines of logs in say 10 seconds . telegraf would flush all of this out without dropping a single line . This could overwhelm the influxdb server if 1000 such telegraf instances do the same thing . I’d like to know if there’s a way to rate limit this such that if the number of lines of logs is more than X drop them until the rest is flushed? Thanks!

Anaisdg · March 23, 2021, 8:24pm

Hello @prashanthjbabul,
Here is a list of all of the configuration options for the agent:

github.com

influxdata/telegraf/blob/master/docs/CONFIGURATION.md#agent

<!-- markdownlint-disable MD024 -->

# Configuration

Telegraf's configuration file is written using [TOML][] and is composed of
three sections: [global tags][], [agent][] settings, and [plugins][].

View the default [telegraf.conf][] config file with all available plugins.

## Generating a Configuration File

A default config file can be generated by telegraf:

```sh
telegraf config > telegraf.conf
```

To generate a file with specific inputs and outputs, you can use the
--input-filter and --output-filter flags:

This file has been truncated. show original

You could try increasing the interval and decreasing the batch size (and buffer limit? not entirely sure). I’ll share your question with the Telegraf team too to see what wisdom they can share. :))

prashanthjbabu · March 24, 2021, 4:19am

Hi @Anaisdg , Thanks for your response ! From my understanding , the way telegraf flushes output is either the interval time or when it reaches the batch/buffer size , whichever is sooner .

I was initially under the impression , that it would always flush at interval time and will drop if the buffer limit is reached until it flushes everything at interval time , but that doesnt seem to be the case.

I’m looking for some way in telegraf to be able to enforce the interval time flush and drop metrics if it exceeds the buffer limit during that time period.

Anaisdg · March 24, 2021, 9:25pm

Hello @prashanthjbabu,
I’m not sure, but I’ve asked the Telegraf team to take a look.

David_Bennett · March 24, 2021, 10:06pm

Hi, have you checked out the max_undelivered_lines property of the tail plugin? it’s essentially a buffer for the plugin that will only process up to that ‘amount’ of metric lines before it blocks and waits for there to be space in that buffer to add in more metrics. Eg. if you set it to ‘100’, it would ensure 100 lines max are in route to the output and will only unblock for more metrics after those metrics are delivered by the output.

prashanthjbabu · March 25, 2021, 4:23am

Thanks @David_Bennett for your response . I just tried it out and it works really well . Thanks for that information . However I was wondering if you know of something generic that can be done at a global level . The reason being there are other plugins like syslog,docker_log which are also log specific and they don’t have a config like max_undelivered_lines to perform some sort of rate limiting.

David_Bennett · March 25, 2021, 1:32pm

To my knowledge, I might not be aware of such a property globally. Generally it’ll be a mix of metric_batch_size and metric_buffer_limit to get the right output. Obviously smaller batch size will result in less metrics at a time, and a higher metric buffer limit will hold more metrics while waiting without dropping them. Keep in mind though, that telegraf should realize if there’s a server error upon attempting output, and should take a step back and retry later. Another thing is, you can set the flush_jitter property, which will help randomize when each telegraf instance spits out metrics, so they don’t all do it at once. This only works if the batch doesn’t fill up first though.

prashanthjbabu · March 26, 2021, 4:08am

@David_Bennett Thanks for your response . Yes I’ve played with the metric batch_size and metric_buffer_limit but as you mentioned This only works if the batch doesn’t fill up first though . This is the case I’m hitting where the logs generated are so high that it fills up the batch instantly causing an instant flush , therefore not being able to rate limit at all.

Franky1 · March 26, 2021, 9:43am

Maybe another approach would be not to push so much data into InfluxDB in the first place?
Is this data really all needed?

Otherwise you could already reduce the amount of data in Telegraf to the actually necessary data with a processors or aggregators plugin or with the help of measurement filtering?

prashanthjbabu · March 26, 2021, 9:57am

@Franky1 Thanks for your response . The data is coming from a log file , which is unpredictable . Sometimes it could generate a steady set of logs but sometimes it could just explode out (which is when i would like to rate limit )

I dont see how measurement filtering would help here , since log files would use the tail measurement and I would like to keep it at that.

Franky1 · March 26, 2021, 10:19am

Of course, my approach only makes sense if you can do without some of the data in the log files.
If you really need every line, it won’t work.
But if the log files would contain information that you don’t need in the InfluxDB, you could filter it out before.

prashanthjbabu · March 26, 2021, 10:23am

@Franky1 I’m also okay to drop certain logs if the buffer is full . If the rate is really high , I’m okay if it drops a few lines but I still would like data to be pushed only every interval seconds.

Topic		Replies	Views
Will tail plugin work for this scenario Telegraf influxdb , telegraf , csv , tail	4	808	March 4, 2022
Limit Telegrafs Output-Writes/s Telegraf telegraf	1	470	January 13, 2022
Error with custom telegraf.conf InfluxDB 2	1	1043	February 22, 2021
Telegraf Tail plugin as a replacement for Filebeat Telegraf influxdb , telegraf	2	1093	February 9, 2022
Tail plugin Interval parameter not working and High CPU usage over 100% for 10min Telegraf telegraf	0	621	March 8, 2021

Rate limiting telegraf tail plugin log files

Related topics