Custom log parsing with latest Tail Plugin, GROK and InfluxDB (Grafana ready)

Influx now uses inputs.tail instead of inputs.logparser

This post shares an examples of the new grok_custom_patterns parameter.

I found mostly examples with old examples from 2017, I hope this catches the eyes of people looking for examples in 2020. I struggled to get Telegraf inputs.tail to work due to the changes in config style and lack of clarity in the InfluxDB/Telegraf/Tail/Logparser/GROK docs.

The docs specifically references the following example but without giving a full example. It also doesn’t explaining how grok_custom_patterns relates to grok_patterns.

Text book example

[[inputs.tail]]
files = [“/var/log/apache/access.log”]
from_beginning = false
grok_patterns = [“%{COMBINED_LOG_FORMAT}”]
name_override = “apache_access_log”
grok_custom_pattern_files = []
grok_custom_patterns = ‘’’
‘’’
grok_timezone = “Canada/Eastern”
data_format = “grok”

Here is my explanation of latest config format:

Example Log Line

RESULT,2020-06-25 15:34:34,UNXPRD01,PROD_INSTANCE01,Running

telegraf.conf

[[inputs.tail]]
  ## file(s) to tail:
  files = ["c:\\magicparser\\customlogs\\host01.log"]
  from_beginning = false

  #name of the "Metric" (which I want to see in Grafana eventually)
  name_override = "magicparser"
 
  grok_patterns = ["%{CUSTOM_LOG}"]
  grok_custom_patterns = '''
MAGICDATE %{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{TIME}
CUSTOM_LOG %{MAGICDATE:date},%{WORD:log_entry_hostname:tag},%{WORD:log_entry_service:tag},%{WORD:log_entry_state}
'''
  data_format = "grok"

Explanations

  1. files” - should be clear, the file you want to monitor. Only one file per input agent.
  2. from_beginning” - only tick if you are sure you want to process your whole log file. I typically only used this when I was testing.
  3. name_override” - Will set the metric name. For my use case, this is the name of the Metric I will use in Grafana. Grafana has Metrics, Tags and Fields. Metric is the primary name. Tags are indexed for high performance. Fields is for often changing data.
  4. grok_patterns” - is the required field (according to my tests) that input.tail can not work without. You have to define your main pattern here. In this case I indicate I will use a CUSTOM_LOG pattern
  5. grok_custom_patterns” - This is the tricky part that the docs did not explain well. This parameter allows me to use the triple quote approach to define a series of lines of custom patterns. This value defines the details of pattern that I had listed above in “grok_patterns”. The two work together. What is nice about GROK patterns is you can use one pattern in another. This way you break the problem up in manageable pieces. See how I defined “MAGICDATE” as my own custom pattern, and then used that in my “CUSTOM_LOG” pattern.
  6. data_format = “grok” - tells the tail plugin that we are using the GROK data format.

Tips

  1. A note on data types. - You will see that I didn’t just leave GROK to decide data formats for me. I overwrite the string default by appending the “tag” keyway to my element definition. This forces InfluxDB to store this field as a tag. This means the field will be indexed. Also in my use case this means I can now see the field pop up in the Grafana query builder as a WHERE parameter. Note that you should not just make all fields tags, see this post about performance impact of tags.

{WORD:log_entry_service:tag}

  1. Use timestamp to force InfluxDB to use your logs time - I didn’t override the Telegraf timestamp with my date. I just store my date as a string field. If I used a timestamp pattern, like follows, I could have told Telegraf to use my log entry’s time as its timestamp and ignore the time of reading the log file:

%{TIMESTAMP_ISO8601:timestamp:ts-“2006-01-02 15:04:05”}

  1. Use command line output for testing - Don’t waste time trying to send data to InfluxDB immediately, first try out your work locally to the standard output/command line using the file output. Remember you can only have one output at a time (as far as I know), so comment out your existing output plugin before adding this:

[[outputs.file]]
files = [“stdout”]
data_format = “influx”

  1. Build and test one field at a time - Pattern matcing is an incredibly tricking and painful exercise. It’s recommended to start with just one parameter in a test log file and then add one by one your other fields. Trust me. I read that advise another blog I can’t find now, and it was a great tip.

For example, with above log file just start with a test.log file simply with:

RESULT

And get that to match and make sure everything is working before tackling the rest.

Good luck for your pattern matching.

5 Likes

Hey @eclements,

I am using the following input conf to transfer data from log file to influxdb but it is not working:

[[inputs.tail]]

files = ["/var/log/testlog.log"]
from_beginning = true
grok_patterns =  ["%{ERROR_LOG}", "%{CUSTOM_LOG}"]
grok_custom_patterns = '''
ERROR_LOG  level=%{LOGLEVEL:severity:tag} msg="%{GREEDYDATA:value:tag}"
CUSTOM_LOG %{GREEDYDATA:value:tag}

‘’’
data_format = “grok”

but if i remove the tag modifiers, it works and i can see the data in my influxdb, something like this:

[[inputs.tail]]

files = ["/var/log/testlog.log"]
from_beginning = true
grok_patterns =  ["%{ERROR_LOG}", "%{CUSTOM_LOG}"]
grok_custom_patterns = '''
ERROR_LOG  level=%{LOGLEVEL:severity} msg="%{GREEDYDATA:value}"
CUSTOM_LOG %{GREEDYDATA:value}

‘’’
data_format = “grok”

Do you have any idea as to why this is happening?
I need those tags in my influxdb

At least one value must be captured as a field.

1 Like

@daniel would it be possible to store the original log content as well as the grok fields . Like in the above case , I’d like to store level and msg and also store “value” ( which is the original log content thrown out by tail )

@prashanthjbabu You can, but I know about a bug you will probably run into. Let’s say you have:

grok_patterns =  ["%{A:a}"]
grok_custom_patterns = '''
    A %{NUMBER:x} %{NUMBER:y}"
'''

And a document:

1 2
3 4

The results would be:

> tail a="1 2",x="1",y="2"
> tail a="3 4",x="3",y="4"

The issue you might run into is that nested patterns with named captures at multiple levels like this don’t save the child “modifiers”, you won’t be able specify if x or y are tags or integers unless you remove the a capture. You can work around this for now using the converter processor.

@daniel Thanks for your reply… I also found out this method which seems to be working as well

[[processors.parser]]
    parse_fields=["value"]
    merge="override"
    grok_patterns =  ["%{ERROR_LOG}"]
    grok_custom_patterns = '''
    ERROR_LOG %{SYSLOGTIMESTAMP:syslog_timestamp:drop} %{SYSLOGHOST:syslog_hostname:drop} %{DATA:syslog_program:string}(?:\[%{POSINT:syslog_pid:drop}\])?: level=%{LOGLEVEL:severity:tag} msg="%{GREEDYDATA:value:drop}"
'''
    data_format = "grok"

The processor.parser takes the “value” field generated by tail and runs grok on it and appends its fields retaining the original “value” . This should work as well right?

1 Like

Hi @mrAbhishek I found the same as what @daniel answered. If I made “all” my values tags, I would not see the data in InfluxDB. I would have to make at least one field not a tag. In other words, have to have at least one field value.

This topic was automatically closed 60 minutes after the last reply. New replies are no longer allowed.