Telegraf - Most performant option to process metrics

h49nakxs · May 9, 2023, 8:58am

Hello there,

I’m using Telegraf to process a lot of input metrics and I have several questions regarding the configuration that will provide the best performance.

I need to drop some metrics based on an AND condition (if "field_A.value == “X” and field_B.value == “Y” then drop). From a performance point of view, is it better to :

Use “metricpass” within the input plugin with a CEL expression.
Use the starlark processor plugin.

I need to rename some field keys because of invalid characters for the output and I think the “processors.strings.replace” plugin is the best option. From a performance point of view, is it better to :

Use field_key = "*" to automatically replace invalid characters from all the field keys.
Use field_key = "NAME_OF_FIELD" for each field (let’s say 10-20) that may contain invalid characters.

And finally, I see some warnings in the logs saying :

Collection took longer than expected; not complete after interval of 20s

There are not a lot of them, maybe one every 10 minutes. I have tried to tweak the agent options like interval, metric_batch_size, metric_buffer_limit, flush_interval but I cannot get rid of them entirely.

Do I need to do something ? Is there a risk of dropped metrics ? Or is it just that the metrics will be collected in the next collection run ?

Thank you.

Hipska · May 9, 2023, 9:44am

Use “metricpass” within the input plugin with a CEL expression.

I don’t know what you mean with “CEL” expression.

Use field_key = "*" to automatically replace invalid characters from all the field keys.

Use field_key = "NAME_OF_FIELD" for each field (let’s say 10-20) that may contain invalid characters.

Using the second option would be more performant as the first one will walk over all existing fields and run the replace function on them…

The final warnings can be dependent on what input plugin you are using. It will indeed mean the input wasn’t finished collected yet at the start of the next interval, so that collection interval will be skipped and the input will be tried again the next interval. It has nothing to do with batch size, buffer limit or flush interval as these are options only relevant for the output plugins.

h49nakxs · May 9, 2023, 9:55am

Thanks for you answer !

Common Expression Language, mentioned here under “Metric Filtering” with a “metricpass” selector :
https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md

I’m using the “win_eventlog” input plugin.

OK, so a collection interval will be skipped but the current collection will be correctly processed, right ? And as long as the “metric_buffer_limit” is not exceeded, not events should be dropped ?

Hipska · May 9, 2023, 10:15am

Metricpass only lets you filter based on the name of the metric, not on the value of a field.

And indeed, the current collected metrics will not be discarded if the buffer is not full.

h49nakxs · May 9, 2023, 11:51am

OK. Thanks for your help !

srebhan · May 10, 2023, 3:26pm

@h49nakxs this is not true! metricpass is new in current master (and not yet released) but it CAN run CEL based filters on metrics. However, I do not have any performance benchmark for starlark but metricpass requires around 1200ns per metric on my machine, which is a throughput of around 800k messages for that plugin.

Hipska · May 11, 2023, 10:10am

Thank, I didn’t know the existence of metricpass and confused it with namepass.

Topic		Replies	Views
Telegraf with logparser timeout insert to influxdb	5	2410	June 19, 2019
Metric buffer overflow Telegraf influxdb , telegraf	4	3721	March 16, 2022
Metric buffer overflow ... metrics have been dropped Telegraf influxdb , telegraf , csv	1	607	December 10, 2024
Collecting metrics for 0 objects Telegraf influxdb , telegraf	4	193	April 30, 2024
Error with custom telegraf.conf InfluxDB 2	1	1043	February 22, 2021

Telegraf - Most performant option to process metrics

Related topics