Telegraf syslog metrics are missing

Hello,
I am trying to configure telegraf(v1.20.4) to collect syslog messages from a centOS server and see partial(best_effort=true) or no metrics(best_effort=false) collected in the output file. The OS has syslog-ng configured and was restricted to install rsyslog (so I cannot have rsyslog installed here, which would have made my job easy). I am using telegraf syslog input plugin to collect syslog messages via udp @ port 6514, summarizing below the my system setup and environement details.

OS
$ uname -r
5.10.74-200.1644.tis.rt.el7.x86_64
$ cat /etc/*release
CentOS Linux release 7.6.1810 (Core)

Telegraf 1.20.4
Plugin Configuration

[[inputs.syslog]]

server = “udp://127.0.0.1:6514”
keep_alive_period = “5m”
read_timeout = “0”
trailer = “LF”
best_effort = false
syslog_standard = “RFC3164”

syslog-ng
$ sudo syslog-ng --version
syslog-ng 3.5.6

syslog-ng.conf
destination remote_log_server {udp(“127.0.0.1” port(6514));};

log { source(s_src); destination(remote_log_server); };

metrics.out
With best_effort=true, the following metrics with limited tags are available
syslog,facility=daemon,facility_code=3,severity=info,severity_code=6 Nb=1i 1663848572734957383
With best_effort=false, there are no metrics and error logs seen.

packet capture:
To debug I post a message to syslog and the comand and correpsonfing packetcapture of message as below;
$ logger -p local3.crit “This is a test Critical syslog generated by the simulator” -t simulator

<154>Sep 23 11:18:58.000 controller-0 simulator: This is a test Critical syslog generated by the simulator

I am not sure of the problem may be telegraf syslog parser is not able to parse the message completely.
Please let me know If I am missing something in the configuration or any other suggestion.

Note that initially I tried this with a TCP based syslog forwarding, but was facing below error, so switched to udp based on the suggestion from this issue#4482

2022-09-21T06:37:38Z E! [inputs.syslog] Error in plugin: found ILLEGAL(<), expecting a MSGLEN

Thanks,
Saravanan G

Hi,

Metrics are suppose to be in RFC 5424 format by default. I beelive this means they should look like:

<13>1 2018-11-14T04:08:13.076005-05:00 ASTitan - - - - Message from debian Syslog.

RFC 3164 is also supported, but you have to specify that as a config option. This looks closer to what you are using:

<34>Jan 12 06:30:00 1.2.3.4 apache_server: 1.2.3.4 - - [12/Jan/2011:06:29:59 +0100] "GET /foo/bar.html HTTP/1.1" 301 96 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2.12) Gecko/20101026 Firefox/3.6.12 ( .NET CLR 3.5.30729)" PID 18904 Time Taken 0

Check out the plugin readme for the syslog_standard option.

Hi, Thanks for the response.
I have the syslog_standard configruation set to RFC3164.
Myy bad, i accidentally removed that configuration while cleaning up the comments.
My sylog plugin configuration as below;
[[inputs.syslog]]

## Specify an ip or hostname with port - eg., tcp://localhost:6514, tcp://10.0.0.1:6514

## Protocol, address and port to host the syslog receiver.

## If no host is specified, then localhost is used.

## If no port is specified, 6514 is used (RFC5425#section-4.1).

server = “udp://127.0.0.1:6514”

## TLS Config

# tls_allowed_cacerts = [“/etc/telegraf/ca.pem”]

# tls_cert = “/etc/telegraf/cert.pem”

# tls_key = “/etc/telegraf/key.pem”

## Period between keep alive probes.

## 0 disables keep alive probes.

## Defaults to the OS configuration.

## Only applies to stream sockets (e.g. TCP).

keep_alive_period = “5m”

## Maximum number of concurrent connections (default = 0).

## 0 means unlimited.

## Only applies to stream sockets (e.g. TCP).

# max_connections = 1024

## Read timeout is the maximum time allowed for reading a single message (default = 5s).

## 0 means unlimited.

read_timeout = “0”

## The framing technique with which it is expected that messages are transported (default = “octet-counting”).

## Whether the messages come using the octect-counting (RFC5425#section-4.3.1, RFC6587#section-3.4.1),

## or the non-transparent framing technique (RFC6587#section-3.4.2).

## Must be one of “octet-counting”, “non-transparent”.

# framing = “octet-counting”

## The trailer to be expected in case of non-transparent framing (default = “LF”).

## Must be one of “LF”, or “NUL”.

trailer = “LF”

## Whether to parse in best effort mode or not (default = false).

## By default best effort parsing is off.

best_effort = false

## The RFC standard to use for message parsing

## By default RFC5424 is used. RFC3164 only supports UDP transport (no streaming support)

## Must be one of “RFC5424”, or “RFC3164”.

syslog_standard = “RFC3164”

## Character to prepend to SD-PARAMs (default = “_”).

## A syslog message can contain multiple parameters and multiple identifiers within structured data section.

## Eg., [id1 name1=“val1” name2=“val2”][id2 name1=“val1” nameA=“valA”]

## For each combination a field is created.

## Its name is created concatenating identifier, sdparam_separator, and parameter name.

# sdparam_separator = “_”

11:18:58.000

It is very strange to see sub-second info in an RFC3164 timestamp. I was under the impresison it should only be HH:MM:SS.

Let’s go back to this comment then. Can you explain what your goal is then? What tags are missing.

Hi, My goal is to collect all the syslog from the host machine and push them to a kafka bus (for debugging purpose using output file plugin). The metrics data should contain tags such as appname, message, timestamp for the system on the other side of Kafka bus to process it to a meaningful information.The host machine run CentOS 7 based Wind River cloud platform which comes preinstalled with syslog-ng. Here are the various combinations of configuration properties value that I tried out to acheive the goal;

Initially tried with TCP based listener having following configuration,
server = “tcp://127.0.0.1:6514”, best_effort = false, syslog_standard = “RFC5424
No metrics outputted but noticed the below error,

E! [inputs.syslog] Error in plugin: found ILLEGAL(<), expecting a MSGLEN

So I switched the listener to udp based on the response for this error in telegraf community
server = “udp://127.0.0.1:6514”, best_effort = false, syslog_standard = “RFC3164
The result was NO error in telegraf.log and NO output in metrics.out.

with below configuration
server = “udp://127.0.0.1:6514”, best_effort = true, syslog_standard = “RFC3164
The result was NO error in telegraf.log and metrics output displayed in metrics.out was missing tags like appname, message, timestamp

syslog,facility=daemon,facility_code=3,severity=info,severity_code=6 Nb=1i 1663848572734957383

Please note that the tags and fields are not in similar order and they are manipulated as per our application needs.

I would like to see the output like the sample below collected from CentOS machine running rsyslog;

syslog,appname=rsyslogd,facility=syslog,facility_code=5,hostname=localhost,message=action\ ‘action\ 0’\ resumed\ (module\ ‘builtin:omfwd’)\ [v8.24.0-55.el7\ try\ [http://www.rsyslog.com/e/2359\](http://www.rsyslog.com/e/2359/) ],severity=info,severity_code=6,timestamp=1654932672856930000,version=1 Nb=1i 1654932672857160498

Is that the timestamp format is failing the telegraf parser to parse the other parts of the log message?

A sample UDP Packet of syslog from the host machine

<30>Sep 23 11:18:57.617 controller-0 collectd[98464]: platform cpu usage plugin Usage: 11.6% (avg per cpu); cpus: 4, Platform: 12.9% (Base: 11.0, k8s-system: 2.0), k8s-addon: 0.0

Thanks for the additional details.

Please note that the tags and fields are not in similar order and they are manipulated as per our application needs.

Are you saying want the tags in a specific order? Tags are generally not order specific. I do not believe we guarantee what order they are produced in.

syslog,appname=rsyslogd,facility=syslog,facility_code=5,hostname=localhost,message=action\ ‘action\ 0’\ resumed\ (module\ ‘builtin:omfwd’)\ [v8.24.0-55.el7\ try\ [[http://www.rsyslog.com/e/2359\](http://www.rsyslog.com/e/2359%5C)](http://www.rsyslog.com/e/2359/) ],severity=info,severity_code=6,timestamp=1654932672856930000,version=1 Nb=1i 1654932672857160498

Having the message show up as a tag is not a best practice. Tags are index and given the wide variety of strings that could come in there, your cardinality could grow without any real bound.

appname=rsyslogd

Since rsyslogd is not in your message at all, you can set up a map of tags to include with your data.

Nb=1i

I’m not sure where this value is coming from

Is that the timestamp format is failing the telegraf parser to parse the other parts of the log message?

I do think that it is causing an issue. Let me give you two ideas to further work on this:

  1. In the syslog readme there is troubleshooting section where you can play with sending example syslog messages using nc to telegraf. You could use this to play with different timestamps.

  2. In that same area, there is a message about how RFC3164 messages are not always valid. I am not sure how much control you have over your application but either trying RFC a different RFC or using rsyslog to translate may help.

Hi, Thanks for the suggestions and I will review reducing the tags by leveraging the fields. We have added few additional tags and fields, please ignore them.

On the syslog plugin problem of getting the metrics I tried out the syslog debugging 2

Below are the output of various timestamp formats supplied through netcat with syslog_standard = “RFC3164” set in plugin configuration

$ echo “<13>1 2018-10-01T12:00:00.0Z example.org root - - - test” | nc -u 127.0.0.1 6514

2022-09-30T10:09:18Z E! [inputs.syslog] Error in plugin: expecting a Stamp timestamp [col 4]

$ echo “<13>1 Oct 11 22:14:15 example.org root - - - test” | nc -u 127.0.0.1 6514
2022-09-30T10:12:23Z E! [inputs.syslog] Error in plugin: expecting a Stamp timestamp [col 4]

$ echo “<13>Sep 30 06:14:15 example.org root - - - test” | nc -u 127.0.0.1 6514
<No error and no output>

I did try the nc message by setting syslog_standard = “RFC5424” and passed the below message, this time the metric data was outputted and no error log.
$ echo “<13>1 2018-10-01T12:00:00.0Z example.org root - - - test” | nc -u 127.0.0.1 6514
syslog,appname=root,facility=user,facility_code=1,hoster=na,hostname=example.org,label=root_notice,message=test,severity=notice,severity_code=5,source=127.0.0.1,timestamp=1538395200000000000,version=1 Nb=1i 1664533543165555253

Based on the debugging results I suppose the problem might be with RFC3164 parser. Any thoughts?

Please note that I have upgraded the telegraf version to 1.24.1 and the above tests are conducted in this version of binary.

Hi,

I have been using the RFC Wiki and the RFC3164 itself to ensure messages look right. To my knowledge your most valid test case is this one:

$ echo “<13>Sep 30 06:14:15 example.org root - - - test” | nc -u 127.0.0.1 6514
<No error and no output>

The <13> is the pri, the timestamp and hostname correctly form the header.

After the header is the message, which is made up of a tag and content. A tag is usually the application and the name of the tag is suppose to be ended by non-alphanumeric character. This is commonly seen as app: my message here. Where “app” is the tag and “my message here” is the content.

However, when I try this, it also doesn’t work :frowning: Feel free to file an issue and reference this thread.

edit: for what it is worth this is what I tried:

[[inputs.syslog]]
  server = "udp://:6514"
  syslog_standard = "RFC3164"
  framing = "non-transparent"
  best_effort = false
$ echo "<13>Sep 30 06:14:15 example.org root: test" | nc -v -u 127.0.0.1 6514
<snip>
2022-09-30T15:04:07Z E! [inputs.syslog] Error in plugin: expecting a priority value within angle brackets [col 0]

I think part of the problem might be with using nc with udp and having it send it multiple times, but I did get this working:

echo -n "<13>Sep 30 06:14:15 example.org root: test" >/dev/udp/localhost/6514

which produced:

syslog,appname=root,facility=user,hostname=example.org,severity=notice,source=::1 facility_code=1i,severity_code=5i,timestamp=1664518455000000000i,message="test" 1664555790384328612

So give that echo command a try!

Hi, First of all thanks and sorry for the late reply.

The echo command with and without netcat are working in my setup having

framing = non-transparent

In order to narrow down the issue I tried sending out the syslog messages with timestamp carrying milliseconds, in this case the metrics are not outputted. My guess is that the additional milliseconds precision in the timestamp was blocking the complete syslog message getting parsed.
I understand that RFC3164 timestamp format do not mention about milliseconds;

The TIMESTAMP field is the local time and is in the format of “Mmm dd
hh:mm:ss” (without the quote marks) where:

Can the syslog plugin ignore the additional precision in timestamp format to parse the message?
The plugin can emit a warning log in case of additional precision and wrong timestamp received in the syslog message.

Please suggest If I can still file a bug to address the issue.
Meantime I will also explore the cause for timestamp format with milliseconds and options to tune it on the syslog-ng configuration.

The test results as below;
[[inputs.syslog]]
server = “udp://:6514”
syslog_standard = “RFC3164”
framing = “non-transparent”
best_effort = false

Success Tests
~$ echo -n “<13>Sep 30 06:14:15 example.org root: test” >/dev/udp/localhost/6514
syslog,appname=root,facility=user,facility_code=1,hostname=example.org,message=test,severity=notice,severity_code=5,source=127.0.0.1timestamp=1664518455000000000 Nb=1i 1664860302611147213

~$ echo -n “<13>Sep 30 06:14:15 example.org root: test” | nc -v -u 127.0.0.1 6514
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 127.0.0.1:6514.
Ncat: 42 bytes sent, 0 bytes received in 0.01 seconds.
syslog,appname=root,facility=user,facility_code=1,hostname=example.org,message=test,severity=notice,severity_code=5,source=127.0.0.1,timestamp=1664518455000000000 Nb=1i 1664860436819412812
Failure Tests
~$ echo -n “<13>Sep 30 06:14:15.000 example.org root: test” | nc -v -u 127.0.0.1 6514
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 127.0.0.1:6514.
Ncat: 46 bytes sent, 0 bytes received in 0.01 seconds.
No metrics output and error log
~$ echo -n “<13>Sep 30 06:14:15.000 example.org root: test” >/dev/udp/localhost/6514
No metrics output and error log

The actual parsing of the messages is handled by GitHub - influxdata/go-syslog: Blazing fast syslog parser. That is where I would open the request. There is an existing issue Can rfc3164 (BSD syslog) parser support custom timestamp format? · Issue #36 · influxdata/go-syslog · GitHub to add support for custom timestamps, but it doesn’t seem to have gotten much traction in a couple years.

I do not think a warning would be wise, because then what would you use for the timestamp if you failed to parse in the first place?

Agree, I mean a warning/error message to indicate parse failure or not parseable log data.Without any log indication, I though the link between the syslog and telefraf listener was broken.

Meantime I was working on other option of correcting the timestamp in the syslog source, I managed to tweak the timeformat of the syslog message in the system and could see the timestamp format matching to the RFC3164 with that additional milliseconds ommitted. I now see more problematic behaviour;

  1. Even with correct timestamp format, syslog metrics and error logs are not outputted.

<30>Oct 10 08:58:57 controller-0 OCF_drbd(drbd-dockerdistribution:0)[2444493]: INFO: drbd-dockerdistribution Primary/UpToDate/DUnknown WFConnection

  1. When I supply the same syslog from packetcapture through netcat, the syslog metrics are outputted.

echo -n “<30>Oct 10 08:58:57 controller-0 OCF_drbd(drbd-dockerdistribution:0)[2444493]: INFO: drbd-dockerdistribution Primary/UpToDate/DUnknown WFConnection” | nc -v -u 127.0.0.1 6514

  1. All the tags for syslog metrics are displayed, but the data parsing seems to be not proper especially with the tags appname and message. procid tag was missing already

My test configuration as below;

server = udp://:6514
framing = “octet-counting”
best_effort = false
syslog_standard = “RFC3164”

echo -n “<30>Oct 10 08:58:57 controller-0 OCF_drbd(drbd-dockerdistribution:0)[2444493]: INFO: drbd-dockerdistribution Primary/UpToDate/DUnknown WFConnection” | nc -v -u 127.0.0.1 6514

syslog,appname=OCF_drbddrbd-dockerdistribution,facility=daemon,facility_code=3,hostname=controller-0,message=OCF_drbd(drbd-dockerdistribution:0)[2444493]:\ INFO:\ drbd-dockerdistribution\ Primary/UpToDate/DUnknown\ WFConnection,severity=info,severity_code=6,source=127.0.0.1,timestamp=1665392337000000000 Nb=1i 1665400121342796377

packet capture

Ah, right the library does not provide the best messages, feel free to file an issue against the rfc library with some examples.

  1. Even with correct timestamp format, syslog metrics and error logs are not outputted.

<30>Oct 10 08:58:57 controller-0 OCF_drbd(drbd-dockerdistribution:0)[2444493]: INFO: drbd-dockerdistribution Primary/UpToDate/DUnknown WFConnection

I would really suggest reading through the RFC 3164 itself. The application is parsed up to the “:”. Everything after that is the message.