Telegraf - Most performant option to process metrics

Hello there,

I’m using Telegraf to process a lot of input metrics and I have several questions regarding the configuration that will provide the best performance.

  1. I need to drop some metrics based on an AND condition (if "field_A.value == “X” and field_B.value == “Y” then drop). From a performance point of view, is it better to :
  • Use “metricpass” within the input plugin with a CEL expression.
  • Use the starlark processor plugin.
  1. I need to rename some field keys because of invalid characters for the output and I think the “processors.strings.replace” plugin is the best option. From a performance point of view, is it better to :
  • Use field_key = "*" to automatically replace invalid characters from all the field keys.
  • Use field_key = "NAME_OF_FIELD" for each field (let’s say 10-20) that may contain invalid characters.

And finally, I see some warnings in the logs saying :

Collection took longer than expected; not complete after interval of 20s

There are not a lot of them, maybe one every 10 minutes. I have tried to tweak the agent options like interval, metric_batch_size, metric_buffer_limit, flush_interval but I cannot get rid of them entirely.

Do I need to do something ? Is there a risk of dropped metrics ? Or is it just that the metrics will be collected in the next collection run ?

Thank you.

  • Use “metricpass” within the input plugin with a CEL expression.

I don’t know what you mean with “CEL” expression.

  • Use field_key = "*" to automatically replace invalid characters from all the field keys.
  • Use field_key = "NAME_OF_FIELD" for each field (let’s say 10-20) that may contain invalid characters.

Using the second option would be more performant as the first one will walk over all existing fields and run the replace function on them…

The final warnings can be dependent on what input plugin you are using. It will indeed mean the input wasn’t finished collected yet at the start of the next interval, so that collection interval will be skipped and the input will be tried again the next interval. It has nothing to do with batch size, buffer limit or flush interval as these are options only relevant for the output plugins.

Thanks for you answer !

Common Expression Language, mentioned here under “Metric Filtering” with a “metricpass” selector :
https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md

I’m using the “win_eventlog” input plugin.

OK, so a collection interval will be skipped but the current collection will be correctly processed, right ? And as long as the “metric_buffer_limit” is not exceeded, not events should be dropped ?

Metricpass only lets you filter based on the name of the metric, not on the value of a field.

And indeed, the current collected metrics will not be discarded if the buffer is not full.

OK. Thanks for your help !

@h49nakxs this is not true! metricpass is new in current master (and not yet released) but it CAN run CEL based filters on metrics. However, I do not have any performance benchmark for starlark but metricpass requires around 1200ns per metric on my machine, which is a throughput of around 800k messages for that plugin.

Thank, I didn’t know the existence of metricpass and confused it with namepass.

1 Like