Parsing an obscure log

I’d like to parse some obscure log with Telegraf. It feels that this is a fairly common use case but I can’t seem to make it work. The log in question has to line based following format:

[SYMLINK] Rename [some src] to [some dest]

I’m only interested in what’s between the last bracket, the ‘some dest’ in this case. I’ve took this blog post as a starting point and arrived at:

[[inputs.logparser]]
  ## file(s) to tail:
  files = ["/tmp/test.log"]
  from_beginning = false
  name_override = "test_metric"
  ## For parsing logstash-style "grok" patterns:
  [inputs.logparser.grok]
    patterns = ["%{CUSTOM_LOG}"]
    custom_patterns = ''' 
      CUSTOM_LOG SYMLINK
    ''' 
    #patterns = ["(?<dest>SYMLINK)"]
    #patterns = ["SYMLINK Rename .* to (?<dest>.*)"]                                                                                                                                                                 
    #patterns = ["[SYMLINK] Rename [.*] to [(?<dest>.*)]"]

[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout"]

The line patterns = ["[SYMLINK] Rename [.*] to [(?<dest>.*)]"] is what I expected to work. The current configuration is sheer desperation. I’m now at a point where I would be happy to make anything work at all.

I feel I fundamentally misunderstand how this works. The format %{<capture syntax>[:<semantic name>][:<modifier>]} seems to apply to predefined capture syntaxs. On https://grokdebug.herokuapp.com/ I got my grok to work with \[SYMLINK\] Rename \[.*\] to \[(?<dest>.*)\] as a pattern and [SYMLINK] Rename [bla] to [foo] as input.

What am I missing?

@haarts These issues are difficult to debug, but the [inputs.logparser.grok] section is wrong. It should look more like this:

[inputs.logparser.grok]
  patterns = '''
    \["%{SYMLINK:symlink:field} Rename %{GREEDYDATA:ip:field} to %{GREEDYDATA:ip:field}"\]
  '''
  custom_patterns = '''
  '''

We relly need to improve documentation around this plugin. There is an open issue on telegraf for this and we are working on it this week. I’ll drop it in this thread when thats done.

Thanks for the reply! I really appreciate it. I’ll follow the mentioned issue on GH.
The pattern you gave isn’t working out of the box but it does give me ideas! I’ll twiddle with it on a spare moment.

@haarts Yeah we are also looking to add some more logging. Hope that gives you a general idea as to how the %{<capture syntax>[:<semantic name>][:<modifier>]} syntax works when defining patterns. Sorry I wasn’t more helpful :confused:

@haarts It would be great to have your feedback on the proposed documentation update. Do you feel like it clears things up?

Some suggestions (which might or might not be correct):
“The ‘capture_syntax’ defines the grok pattern that used to parse the input line. A grok pattern is either a predefined constant or a regex. The ‘semantic_name’ is used to name the field or tag. The ‘extension’ modifier controls the data type that the parsed item is converted to or other special handling.”

I’m not sure the regex needs to be in a specific form. Logstash has ‘(?pattern)’ but that doesn’t seem necessary here. Or can it be used as an alternative to the grok pattern?

Also the config parameter ‘patterns’ is a list. Why?

" in the grok langauge, we must" -> typo

The section “TOML Escaping” is vital. And the mention of ‘flush_interval’ would have helped me too, obvious in retrospect of course but I didn’t think of it.

One last thing I would love added it how to do matching without using any of the predefined patterns like USER or TIME. Is that even possible?
For example given the string: “The fox jumps over the red mouse.” Can I match with a pattern “.? fox jumps over the .? {%DATA:jumpee}”?

Thanks for the review, this is great feedback.

I’ll update the pull request but just to answer your last question quickly, you can parse without using an predefined patterns. At a basic level grok patterns are just regular expressions, the best way to see this is to look at the source of the predefined patterns. If you do this you will see the regular expressions underlying them and you can create your own using the custom_patterns or custom_pattern_file options.

If instead you were asking if its possible not have these patterns even defined, that is not possible.

1 Like

Also .*? fox jumps should match “The fox jumps”, but I think you will need to define an intermediate pattern to capture it. Pattern being a list just allows you do have several top level patterns.

1 Like

That clears it up then. I studied the source defining the patterns quite a bit but I feel you just answered the last missing pieces. Thanks a lot!

I’m glad you found the review valuable. Best of luck with the product!

1 Like

For the record, I recently learned that this should work without an intermediate pattern (untested): (?<field_name>the pattern here)

I haven’t had luck trying that. My stuff ATM works so I don’t think I’ll spend time on it. But this should definitely be included in the docs.