Influx now uses inputs.tail instead of inputs.logparser
This post shares an examples of the new grok_custom_patterns parameter.
I found mostly examples with old examples from 2017, I hope this catches the eyes of people looking for examples in 2020. I struggled to get Telegraf inputs.tail to work due to the changes in config style and lack of clarity in the InfluxDB/Telegraf/Tail/Logparser/GROK docs.
The docs specifically references the following example but without giving a full example. It also doesn’t explaining how grok_custom_patterns relates to grok_patterns.
Text book example
[[inputs.tail]]
files = [“/var/log/apache/access.log”]
from_beginning = false
grok_patterns = [“%{COMBINED_LOG_FORMAT}”]
name_override = “apache_access_log”
grok_custom_pattern_files = []
grok_custom_patterns = ‘’’
‘’’
grok_timezone = “Canada/Eastern”
data_format = “grok”
Here is my explanation of latest config format:
Example Log Line
RESULT,2020-06-25 15:34:34,UNXPRD01,PROD_INSTANCE01,Running
telegraf.conf
[[inputs.tail]]
## file(s) to tail:
files = ["c:\\magicparser\\customlogs\\host01.log"]
from_beginning = false
#name of the "Metric" (which I want to see in Grafana eventually)
name_override = "magicparser"
grok_patterns = ["%{CUSTOM_LOG}"]
grok_custom_patterns = '''
MAGICDATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}
CUSTOM_LOG %{MAGICDATE:date},%{WORD:log_entry_hostname:tag},%{WORD:log_entry_service:tag},%{WORD:log_entry_state}
'''
data_format = "grok"
Explanations
- “files” - should be clear, the file you want to monitor. Only one file per input agent.
- “from_beginning” - only tick if you are sure you want to process your whole log file. I typically only used this when I was testing.
- “name_override” - Will set the metric name. For my use case, this is the name of the Metric I will use in Grafana. Grafana has Metrics, Tags and Fields. Metric is the primary name. Tags are indexed for high performance. Fields is for often changing data.
- “grok_patterns” - is the required field (according to my tests) that input.tail can not work without. You have to define your main pattern here. In this case I indicate I will use a CUSTOM_LOG pattern
- “grok_custom_patterns” - This is the tricky part that the docs did not explain well. This parameter allows me to use the triple quote approach to define a series of lines of custom patterns. This value defines the details of pattern that I had listed above in “grok_patterns”. The two work together. What is nice about GROK patterns is you can use one pattern in another. This way you break the problem up in manageable pieces. See how I defined “MAGICDATE” as my own custom pattern, and then used that in my “CUSTOM_LOG” pattern.
- data_format = “grok” - tells the tail plugin that we are using the GROK data format.
Tips
- A note on data types. - You will see that I didn’t just leave GROK to decide data formats for me. I overwrite the string default by appending the “tag” keyway to my element definition. This forces InfluxDB to store this field as a tag. This means the field will be indexed. Also in my use case this means I can now see the field pop up in the Grafana query builder as a WHERE parameter. Note that you should not just make all fields tags, see this post about performance impact of tags.
{WORD:log_entry_service:tag}
- Use timestamp to force InfluxDB to use your logs time - I didn’t override the Telegraf timestamp with my date. I just store my date as a string field. If I used a timestamp pattern, like follows, I could have told Telegraf to use my log entry’s time as its timestamp and ignore the time of reading the log file:
%{TIMESTAMP_ISO8601:timestamp:ts-“2006-01-02 15:04:05”}
- Use command line output for testing - Don’t waste time trying to send data to InfluxDB immediately, first try out your work locally to the standard output/command line using the file output. Remember you can only have one output at a time (as far as I know), so comment out your existing output plugin before adding this:
[[outputs.file]]
files = [“stdout”]
data_format = “influx”
- Build and test one field at a time - Pattern matcing is an incredibly tricking and painful exercise. It’s recommended to start with just one parameter in a test log file and then add one by one your other fields. Trust me. I read that advise another blog I can’t find now, and it was a great tip.
For example, with above log file just start with a test.log file simply with:
RESULT
And get that to match and make sure everything is working before tackling the rest.
Good luck for your pattern matching.