Why is data_format (grok) not happy?

So this is what I am trying to do.

[root@test telegraf]# cat telegraf.conf 
[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  quiet = false

[[inputs.tail]]
  files = ["/var/log/messages"]
  from_beginning = true
  watch_method = "inotify"
  data_format = "grok"
[[outputs.file]]
  files = ["stdout"]
  data_format = "prometheus"

I am trying to output the logging data from the inputs in the prometheus format however I am seeing the following error:

Aug  1 16:31:32 test telegraf[1079]: 2024-08-01T06:31:32Z D! [parsers.grok::tail]  Grok no match found for or no data extracted from: "Aug  1 16:31:11 test rsyslogd[1071]: message too long (14623) with configured size 8096, begin of message is: 2024-08-01T06:31:11Z D! [parsers.grok::tail]  Grok no match found for or no data [v8.2102.0-15.el8 try https://www.rsyslog.com/e/2445 ]"

What does this mean?

The log that I am interested in to be added to the stdout is the following:

Jul 29 19:18:31 apollo ceph-osd[1770536]: _get_class not permitted to load sdk

Can someone please help me through this?

After a little crying I managed to come up with the grok_pattern for it as follows:

%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{WORD:ceph}-%{WORD:instance}\[%{POSINT:pid}\]: %{GREEDYDATA:message}

This seems to work for the log in a grok debugger online as I can see the output as:

[
  {
    "timestamp": "Jul 29 19:18:31",
    "hostname": "apollo3",
    "ceph": "ceph",
    "instance": "osd",
    "pid": 1770536,
    "message": "_get_class not permitted to load sdk"
  }
]

for the following log line:

Jul 29 19:18:31 apollo ceph-osd[1770536]: _get_class not permitted to load sdk

However, when I use the same grok_patter in my telegraf.conf like:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  quiet = false
[[inputs.tail]]
  files = ["/var/log/messages"]
  from_beginning = true
  watch_method = "inotify"
  data_format = "grok"
  grok_patterns = ["%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{WORD:ceph}-%{WORD:instance}\[%{POSINT:pid}\]: %{GREEDYDATA:message}"]
[[outputs.file]]
  files = ["stdout"]
  data_format = "prometheus"

I get a syntax error:

2024-08-01T07:26:16Z E! error loading config file /etc/telegraf/telegraf.conf: error parsing data: line 17: invalid TOML syntax

But if I replace the backslash \ with \\, then there is no syntax error but the log which I want to be read does not get read. As shown below:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  debug = true
  quiet = false
[[inputs.tail]]
  files = ["/var/log/messages"]
  from_beginning = true
  watch_method = "inotify"
  data_format = "grok"
  grok_patterns = ["%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{WORD:ceph}-%{WORD:instance}\\[%{POSINT:pid}\\]: %{GREEDYDATA:message}"]
[[outputs.file]]
  files = ["stdout"]
  data_format = "prometheus"

the output is:

root@test telegraf]# telegraf --config /etc/telegraf/telegraf.conf -test
2024-08-01T07:35:35Z I! Loading config: /etc/telegraf/telegraf.conf
2024-08-01T07:35:35Z I! Starting Telegraf 1.31.2 brought to you by InfluxData the makers of InfluxDB
2024-08-01T07:35:35Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 60 outputs, 6 secret-stores
2024-08-01T07:35:35Z I! Loaded inputs: tail
2024-08-01T07:35:35Z I! Loaded aggregators: 
2024-08-01T07:35:35Z I! Loaded processors: 
2024-08-01T07:35:35Z I! Loaded secretstores: 
2024-08-01T07:35:35Z W! Outputs are not used in testing mode!
2024-08-01T07:35:35Z I! Tags enabled: host=apollo3.procan.local
2024-08-01T07:35:35Z D! [agent] Initializing plugins
2024-08-01T07:35:35Z D! [agent] Starting service inputs
2024-08-01T07:35:35Z D! [inputs.tail]  Tail added for "/var/log/messages"
2024-08-01T07:35:35Z D! [agent] Stopping service inputs
2024-08-01T07:35:35Z D! [inputs.tail]  Tail removed for "/var/log/messages"
> tail,host=apollo.local,path=/var/log/messages ceph="ceph",hostname="apollo",instance="mon",message="mon.apollo@3(peon) e10 handle_command mon_command({\"prefix\": \"status\"} v 0) v1",pid="2343148",timestamp="Jul 28 03:33:01" 1722497735536110962
2024-08-01T07:35:35Z D! [agent] Input channel closed
2024-08-01T07:35:35Z D! [agent] Stopped Successfully
> tail,host=apollo.local,path=/var/log/messages ceph="ceph",hostname="apollo",instance="mon",message="log_channel(audit) log [DBG] : from='client.? 192.168.x.x:0/2107390132' entity='client.admin' cmd=[{\"prefix\": \"status\"}]: dispatch",pid="2343148",timestamp="Jul 28 03:33:01" 1722497735536155572

So as we can see the output does not have the one I actually want. What am I missing?

You are taking a grok pattern and dumping it into a TOML file. TOML has its own syntax and if you put that line through a TOML validator it will complain.

What you want is a string literal or a string enclosed with single quotes:

grok_patterns = ['%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{WORD:ceph}-%{WORD:instance}\[%{POSINT:pid}\]: %{GREEDYDATA:message}']

Hi @jpowers, thanks for clearing that up. It seems my syntax is correct for toml now. The log I actually want which is:

Jul 29 19:18:31 apollo ceph-osd[1770536]: _get_class not permitted to load sdk

is not in the output after I ran telegraf.conf with the single quotes grok pattern.

What I do see in the output is:

2024-08-01T23:37:37Z D! [inputs.tail]  Tail added for "/var/log/messages"
2024-08-01T23:37:37Z D! [agent] Stopping service inputs
2024-08-01T23:37:37Z D! [inputs.tail]  Tail removed for "/var/log/messages"
2024-08-01T23:37:37Z D! [agent] Input channel closed
2024-08-01T23:37:37Z D! [agent] Stopped Successfully
> tail,host=apollo3.procan.local,path=/var/log/messages ceph="ceph",hostname="apollo",instance="mon",message="mon.apollo@3(peon) e10 handle_command mon_command({\"prefix\": \"status\"} v 0) v1",pid="2343148",timestamp="Jul 28 03:33:01" 1722555457561892425

This output above shows a log from Jul 28 03:33:01.
The log I want is from Jul 29 19:18:31 as shown below:

[root@apollo telegraf]# cat /var/log/messages | grep "get_class"
Jul 29 19:18:31 apollo ceph-osd[1770536]: _get_class not permitted to load sdk

Any ‘tail’ command tails a file from the given time it is run.
so how come the output from Jul 28 03:33:01 gets read but not the output from Jul 29 19:18:31

Additionally when I restarted the telegraf service I see lots of stdout messages as follows:

Aug 02 10:28:36 apollo.local telegraf[4006157]: 2024-08-02T00:28:36Z D! [parsers.grok::tail]  Grok no match found for or no data extracted from: "Aug  1 18:09:14 apollo telegraf[1915857]: 2024-08-01T08:09:14Z D! [parsers.grok::tail]

If the tail plugin filters output based on the grok_pattern we provide then why do I see these outputs of logs that do not match the grok_pattern?

Note the PID number you are getting does not match that message. So it appears to be parsing a different message.

If I use your example line:

Jul 29 19:18:31 apollo ceph-osd[1770536]: _get_class not permitted to load sdk

With this config:

[agent]
  debug = true
  omit_hostname = true

[[inputs.file]]
  files = ["messages"]
  data_format = "grok"
  grok_patterns = ['%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{WORD:ceph}-%{WORD:instance}\[%{POSINT:pid}\]: %{GREEDYDATA:message}']

[[outputs.file]]

I get:

❯ ./telegraf --config config.toml --once
2024-08-02T12:49:28Z I! Loading config: config.toml
2024-08-02T12:49:28Z I! Starting Telegraf 1.32.0-094eff6a brought to you by InfluxData the makers of InfluxDB
2024-08-02T12:49:28Z I! Available plugins: 234 inputs, 9 aggregators, 32 processors, 26 parsers, 62 outputs, 6 secret-stores
2024-08-02T12:49:28Z I! Loaded inputs: file
2024-08-02T12:49:28Z I! Loaded aggregators:
2024-08-02T12:49:28Z I! Loaded processors:
2024-08-02T12:49:28Z I! Loaded secretstores:
2024-08-02T12:49:28Z I! Loaded outputs: file
2024-08-02T12:49:28Z I! Tags enabled:
2024-08-02T12:49:28Z D! [agent] Initializing plugins
2024-08-02T12:49:28Z D! [agent] Connecting outputs
2024-08-02T12:49:28Z D! [agent] Attempting connection to [outputs.file]
2024-08-02T12:49:28Z D! [agent] Successfully connected to outputs.file
2024-08-02T12:49:28Z D! [agent] Starting service inputs
2024-08-02T12:49:28Z D! [agent] Stopping service inputs
2024-08-02T12:49:28Z D! [agent] Input channel closed
2024-08-02T12:49:28Z I! [agent] Hang on, flushing any cached metrics before shutdown
file message="_get_class not permitted to load sdk",timestamp="Jul 29 19:18:31",hostname="apollo",ceph="ceph",instance="osd",pid="1770536" 1722602969000000000
2024-08-02T12:49:28Z D! [outputs.file] Wrote batch of 1 metrics in 39.07µs
2024-08-02T12:49:28Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2024-08-02T12:49:28Z I! [agent] Stopping running outputs
2024-08-02T12:49:28Z D! [agent] Stopped Successfully

That does look correct, no?

In your config, you have:

from_beginning = true

Are there other entries at the beginning that are getting read in from a previous date? Maybe entries from July 28?

Additionally when I restarted the telegraf service I see lots of stdout messages as follows:

Aug 02 10:28:36 apollo.local telegraf[4006157]: 2024-08-02T00:28:36Z D! [parsers.grok::tail] Grok no match found for or no data extracted from: "Aug 1 18:09:14 apollo telegraf[1915857]: 2024-08-01T08:09:14Z D! [parsers.grok::tail]

That line does not match your pattern. Your pattern is looking for %{WORD:ceph}-%{WORD:instance} and there is no hyphenated component in that message, hence no match.

The error you’re seeing, Grok no match found for or no data extracted from, indicates that the Grok parser is unable to find a match for the log lines in your /var/log/messages file. This means the Grok patterns you have configured (or the default ones if none are specified) do not match the log format.

Here are the steps to resolve this issue:

Define a Grok Pattern: You need to define a Grok pattern that matches the log format you want to parse. For the example log you provided:
Jul 29 19:18:31 apollo ceph-osd[1770536]: _get_class not permitted to load sdk
A corresponding Grok pattern could be:
%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{DATA:program}(?:[%{POSINT:pid}])?: %{GREEDYDATA:message}
This pattern will capture the timestamp, hostname, program name, process ID, and the message from your logs.

Update Telegraf Configuration: Update the inputs.tail section of your telegraf.conf to include the custom Grok pattern.

Here’s an updated version of your telegraf.conf:
[agent]
interval = “10s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “0s”
flush_interval = “10s”
flush_jitter = “0s”
precision = “0s”
debug = true
quiet = false

[[inputs.tail]]
files = [“/var/log/messages”]
from_beginning = true
watch_method = “inotify”
data_format = “grok”
grok_patterns = [“%{SYSLOGTIMESTAMP:timestamp} %{HOSTNAME:hostname} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:message}”]

[[outputs.file]]
files = [“stdout”]
data_format = “prometheus”

In this configuration:

grok_patterns is set to the custom pattern that matches your log format.
Verify Grok Pattern: Before running Telegraf, you can use an online Grok debugger (such as Grok Debugger) to test your pattern against your logs to ensure it matches correctly.
After making these changes, restart Telegraf and check the logs again. If the pattern matches, Telegraf should be able to parse your logs and output them in Prometheus format as you desire. If you still face issues, consider seeking help from experts in custom software development, maybe they can help you https://tech-stack.com/

@Slaughterhaus Thank you for being an active member of our community forum. I noticed your recent post advertising your tech service. While we appreciate the expertise you bring to the community, we have guidelines in place to ensure that our forum remains a helpful and enjoyable space for everyone. Please keep your posts limited to relevant and helpful contributions. Direct promotions or advertisements for services or products aren’t allowed in the general discussion sections. Thank you for your understanding!