I’m using Telegraf to process a lot of input metrics and I have several questions regarding the configuration that will provide the best performance.
I need to drop some metrics based on an AND condition (if "field_A.value == “X” and field_B.value == “Y” then drop). From a performance point of view, is it better to :
Use “metricpass” within the input plugin with a CEL expression.
Use the starlark processor plugin.
I need to rename some field keys because of invalid characters for the output and I think the “processors.strings.replace” plugin is the best option. From a performance point of view, is it better to :
Use field_key = "*" to automatically replace invalid characters from all the field keys.
Use field_key = "NAME_OF_FIELD" for each field (let’s say 10-20) that may contain invalid characters.
And finally, I see some warnings in the logs saying :
Collection took longer than expected; not complete after interval of 20s
There are not a lot of them, maybe one every 10 minutes. I have tried to tweak the agent options like interval, metric_batch_size, metric_buffer_limit, flush_interval but I cannot get rid of them entirely.
Do I need to do something ? Is there a risk of dropped metrics ? Or is it just that the metrics will be collected in the next collection run ?
Use “metricpass” within the input plugin with a CEL expression.
I don’t know what you mean with “CEL” expression.
Use field_key = "*" to automatically replace invalid characters from all the field keys.
Use field_key = "NAME_OF_FIELD" for each field (let’s say 10-20) that may contain invalid characters.
Using the second option would be more performant as the first one will walk over all existing fields and run the replace function on them…
The final warnings can be dependent on what input plugin you are using. It will indeed mean the input wasn’t finished collected yet at the start of the next interval, so that collection interval will be skipped and the input will be tried again the next interval. It has nothing to do with batch size, buffer limit or flush interval as these are options only relevant for the output plugins.
Common Expression Language, mentioned here under “Metric Filtering” with a “metricpass” selector : https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md
I’m using the “win_eventlog” input plugin.
OK, so a collection interval will be skipped but the current collection will be correctly processed, right ? And as long as the “metric_buffer_limit” is not exceeded, not events should be dropped ?
@h49nakxs this is not true! metricpass is new in current master (and not yet released) but it CAN run CEL based filters on metrics. However, I do not have any performance benchmark for starlark but metricpass requires around 1200ns per metric on my machine, which is a throughput of around 800k messages for that plugin.