I’d like to parse some obscure log with Telegraf. It feels that this is a fairly common use case but I can’t seem to make it work. The log in question has to line based following format:
[SYMLINK] Rename [some src] to [some dest]
I’m only interested in what’s between the last bracket, the ‘some dest’ in this case. I’ve took this blog post as a starting point and arrived at:
[[inputs.logparser]]
## file(s) to tail:
files = ["/tmp/test.log"]
from_beginning = false
name_override = "test_metric"
## For parsing logstash-style "grok" patterns:
[inputs.logparser.grok]
patterns = ["%{CUSTOM_LOG}"]
custom_patterns = '''
CUSTOM_LOG SYMLINK
'''
#patterns = ["(?<dest>SYMLINK)"]
#patterns = ["SYMLINK Rename .* to (?<dest>.*)"]
#patterns = ["[SYMLINK] Rename [.*] to [(?<dest>.*)]"]
[[outputs.file]]
## Files to write to, "stdout" is a specially handled file.
files = ["stdout"]
The line patterns = ["[SYMLINK] Rename [.*] to [(?<dest>.*)]"] is what I expected to work. The current configuration is sheer desperation. I’m now at a point where I would be happy to make anything work at all.
I feel I fundamentally misunderstand how this works. The format %{<capture syntax>[:<semantic name>][:<modifier>]} seems to apply to predefined capture syntaxs. On https://grokdebug.herokuapp.com/ I got my grok to work with \[SYMLINK\] Rename \[.*\] to \[(?<dest>.*)\] as a pattern and [SYMLINK] Rename [bla] to [foo] as input.
We relly need to improve documentation around this plugin. There is an open issue on telegraf for this and we are working on it this week. I’ll drop it in this thread when thats done.
Thanks for the reply! I really appreciate it. I’ll follow the mentioned issue on GH.
The pattern you gave isn’t working out of the box but it does give me ideas! I’ll twiddle with it on a spare moment.
@haarts Yeah we are also looking to add some more logging. Hope that gives you a general idea as to how the %{<capture syntax>[:<semantic name>][:<modifier>]} syntax works when defining patterns. Sorry I wasn’t more helpful
Some suggestions (which might or might not be correct):
“The ‘capture_syntax’ defines the grok pattern that used to parse the input line. A grok pattern is either a predefined constant or a regex. The ‘semantic_name’ is used to name the field or tag. The ‘extension’ modifier controls the data type that the parsed item is converted to or other special handling.”
I’m not sure the regex needs to be in a specific form. Logstash has ‘(?pattern)’ but that doesn’t seem necessary here. Or can it be used as an alternative to the grok pattern?
Also the config parameter ‘patterns’ is a list. Why?
" in the grok langauge, we must" -> typo
The section “TOML Escaping” is vital. And the mention of ‘flush_interval’ would have helped me too, obvious in retrospect of course but I didn’t think of it.
One last thing I would love added it how to do matching without using any of the predefined patterns like USER or TIME. Is that even possible?
For example given the string: “The fox jumps over the red mouse.” Can I match with a pattern “.? fox jumps over the .? {%DATA:jumpee}”?
I’ll update the pull request but just to answer your last question quickly, you can parse without using an predefined patterns. At a basic level grok patterns are just regular expressions, the best way to see this is to look at the source of the predefined patterns. If you do this you will see the regular expressions underlying them and you can create your own using the custom_patterns or custom_pattern_file options.
If instead you were asking if its possible not have these patterns even defined, that is not possible.
Also .*? fox jumps should match “The fox jumps”, but I think you will need to define an intermediate pattern to capture it. Pattern being a list just allows you do have several top level patterns.