Processor filter by field value

My use case is processing logs using inputs.tail. The first stage is to do the basic breakup of the log message which works just fine. The remaining portion is captured by %{GREEDYDATA:message}. I now want a second pass of the newly created “message” field to enrich the metric with a number of tags (I am on telegraf 1.26.x).

What I have tried:

Using processor.regex, but this fails as I have to repeat the same regex over and over for each tag I want to create. Note I am not always capturing data to create the tags.

[[processors.regex.fields]]
# extract state
key = “message”
pattern = ‘.old state Established.
replacement = “down”
result_key = “state”

[[processors.regex.fields]]
# normalise message
key = “message”
pattern = ‘.old state Established.
replacement = “BGP Neighbor Down”
result_key = “message_normalised”

[[processors.regex.fields]]
# extract state
key = “message”
pattern = ‘.new state Established.
replacement = “up”
result_key = “state”

[[processors.regex.fields]]
# normalise message
key = “message”
pattern = ‘.new state Established.
replacement = “BGP Neighbor Up”
result_key = “message_normalised”

Using processor.override. This allows creating multiple tags as I require, but I cannot trigger the processor on contents of a specific field (something like field contents = glob pattern. I can only do this on field name using fieldpass).

[[processors.override]]
[processors.override.tags]
state = “down”
message_normalised = “Interface Down”
level = “warning”

[[processors.override]]
[processors.override.tags]
state = “up”
message_normalised = “Interface Up”
level = “normal”

Other options:

Using starlark: As per docs its slow
Using the new Common Expression Language (CEL) feature: As per docs its slow
Using the new Allow batch transforms using named groups processor.regex feature (pull #13971): Not going to work for creating new tags or fields as the tag or field value will come from the capture group, not a constant.

So to rephrase, is there a way to do the following:

Trigger processor only if field X contents matches glob Y, then create a number of constant tags.

Processor.override looks like the best fit for this.

I have searched for a fieldpass method to filter on contents of a field, not on field name, but have been unsuccessful. Any suggestions on how to achieve this in the most optimal way?

Using starlark: As per docs its slow
Using the new Common Expression Language (CEL) feature: As per docs its slow

Starlark and metricpass are the solution to what you are trying to do. We call out that they are slow because you provide an arbitrarily set of expressions or actual code to run, which requires computation time plus loading of their respective environments. Unless you are trying to parsed tens of thousand of lines of data at each collection interval, you may not even seen an impact.

I would suggest giving them a try and see just what level of impact they have before outright dismissing them.

@richarde is it possible to annotate the “class” of the event using the regexp e.g.

[[processors.regex.fields]]
# extract state
key = “message”
pattern = ‘.old state Established.’
replacement = “down”
result_key = “state”

[[processors.regex.fields]]
# extract state
key = “message”
pattern = ‘.new state Established.’
replacement = “up”
result_key = “state”

as you can then use the lookup processor to fill in all other tags from a JSON file. That processor should be available in your version…

Thanks for the response. I use Starlark in another part of my pipeline to process gnmi messages. For my use case using Starlark increased my instance load by roughly 30%. This is fine for now as I was not forced into going to the next tier of AWS instance size. I don’t like handing more cash to the cloud providers if I can avoid it, and for me personally inefficient code has a climate impact - just more resource guzzling servers in some datacentre somewhere.

My above Starlark usecase is very simple - drop metrics that are in the past (older than 5mins) to prevent duplicates on gnmi collection restart.

@srebhan this solution will work for me! Thanks. Now if the key match for the lookup processor allowed a glob pattern that would solve pretty much all my lookup needs with a single lookup table and significantly reduce my telegraf config and complexity.

Looking at the code around https://github.com/influxdata/telegraf/blob/e2c4e10650677610bb9307856975bae0bdfae07f/plugins/processors/lookup/lookup.go#L77 I can see that the lookup is a hashmap. Allowing a glob lookup will involve iterating over the maps keys looking fort the first match and then breaking the loop, but thats still way more efficient for me than having pages of telegraf configs to maintain.

Please consider adding glob lookups for processor.lookup. This will need to be optional via a parameter to ensure backwards compatibillity.

Well you can “construct” the key using Go templates so you might simply truncate the value. Not exactly globbing but maybe good-enough!? If not, you should use the regexp processor first to “unify” the to-be-globbed metrics and give them a certain tag…