Parsing an obscure log

haarts · April 11, 2017, 10:20pm

I’d like to parse some obscure log with Telegraf. It feels that this is a fairly common use case but I can’t seem to make it work. The log in question has to line based following format:

[SYMLINK] Rename [some src] to [some dest]

I’m only interested in what’s between the last bracket, the ‘some dest’ in this case. I’ve took this blog post as a starting point and arrived at:

[[inputs.logparser]]
  ## file(s) to tail:
  files = ["/tmp/test.log"]
  from_beginning = false
  name_override = "test_metric"
  ## For parsing logstash-style "grok" patterns:
  [inputs.logparser.grok]
    patterns = ["%{CUSTOM_LOG}"]
    custom_patterns = ''' 
      CUSTOM_LOG SYMLINK
    ''' 
    #patterns = ["(?<dest>SYMLINK)"]
    #patterns = ["SYMLINK Rename .* to (?<dest>.*)"]                                                                                                                                                                 
    #patterns = ["[SYMLINK] Rename [.*] to [(?<dest>.*)]"]

[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout"]

The line patterns = ["[SYMLINK] Rename [.*] to [(?<dest>.*)]"] is what I expected to work. The current configuration is sheer desperation. I’m now at a point where I would be happy to make anything work at all.

I feel I fundamentally misunderstand how this works. The format %{<capture syntax>[:<semantic name>][:<modifier>]} seems to apply to predefined capture syntaxs. On https://grokdebug.herokuapp.com/ I got my grok to work with \[SYMLINK\] Rename \[.*\] to \[(?<dest>.*)\] as a pattern and [SYMLINK] Rename [bla] to [foo] as input.

What am I missing?

jackzampolin · April 12, 2017, 8:28pm

@haarts These issues are difficult to debug, but the [inputs.logparser.grok] section is wrong. It should look more like this:

[inputs.logparser.grok]
  patterns = '''
    \["%{SYMLINK:symlink:field} Rename %{GREEDYDATA:ip:field} to %{GREEDYDATA:ip:field}"\]
  '''
  custom_patterns = '''
  '''

We relly need to improve documentation around this plugin. There is an open issue on telegraf for this and we are working on it this week. I’ll drop it in this thread when thats done.

haarts · April 13, 2017, 5:58am

Thanks for the reply! I really appreciate it. I’ll follow the mentioned issue on GH.
The pattern you gave isn’t working out of the box but it does give me ideas! I’ll twiddle with it on a spare moment.

jackzampolin · April 13, 2017, 5:10pm

@haarts Yeah we are also looking to add some more logging. Hope that gives you a general idea as to how the %{<capture syntax>[:<semantic name>][:<modifier>]} syntax works when defining patterns. Sorry I wasn’t more helpful

daniel · April 13, 2017, 6:11pm

@haarts It would be great to have your feedback on the proposed documentation update. Do you feel like it clears things up?

haarts · April 14, 2017, 5:30pm

Some suggestions (which might or might not be correct):
“The ‘capture_syntax’ defines the grok pattern that used to parse the input line. A grok pattern is either a predefined constant or a regex. The ‘semantic_name’ is used to name the field or tag. The ‘extension’ modifier controls the data type that the parsed item is converted to or other special handling.”

I’m not sure the regex needs to be in a specific form. Logstash has ‘(?pattern)’ but that doesn’t seem necessary here. Or can it be used as an alternative to the grok pattern?

Also the config parameter ‘patterns’ is a list. Why?

" in the grok langauge, we must" -> typo

The section “TOML Escaping” is vital. And the mention of ‘flush_interval’ would have helped me too, obvious in retrospect of course but I didn’t think of it.

One last thing I would love added it how to do matching without using any of the predefined patterns like USER or TIME. Is that even possible?
For example given the string: “The fox jumps over the red mouse.” Can I match with a pattern “.? fox jumps over the .? {%DATA:jumpee}”?

daniel · April 14, 2017, 6:24pm

Thanks for the review, this is great feedback.

I’ll update the pull request but just to answer your last question quickly, you can parse without using an predefined patterns. At a basic level grok patterns are just regular expressions, the best way to see this is to look at the source of the predefined patterns. If you do this you will see the regular expressions underlying them and you can create your own using the custom_patterns or custom_pattern_file options.

If instead you were asking if its possible not have these patterns even defined, that is not possible.

daniel · April 14, 2017, 6:35pm

Also .*? fox jumps should match “The fox jumps”, but I think you will need to define an intermediate pattern to capture it. Pattern being a list just allows you do have several top level patterns.

haarts · April 15, 2017, 8:56am

That clears it up then. I studied the source defining the patterns quite a bit but I feel you just answered the last missing pieces. Thanks a lot!

I’m glad you found the review valuable. Best of luck with the product!

daniel · April 17, 2017, 9:27pm

For the record, I recently learned that this should work without an intermediate pattern (untested): (?<field_name>the pattern here)

haarts · April 19, 2017, 9:50am

I haven’t had luck trying that. My stuff ATM works so I don’t think I’ll spend time on it. But this should definitely be included in the docs.

Topic		Replies	Views
Telegraf tail input parsing using GROK - Syntax help Telegraf grok , timestamp , golang	5	2629	May 18, 2022
Need to parse a log file with logparser Telegraf telegraf	1	764	June 25, 2020
Parse a custom log using telegraf logparser input Telegraf telegraf	6	13672	February 18, 2022
Telegraf Tail plugin - Multiline management telegraf , tail	1	1330	April 20, 2021
How to apply grok to logs from syslog in telegraf? telegraf	1	2320	June 28, 2019

Parsing an obscure log

Related topics