Telegraf file plugin with utf-16 le bom and grok not working

astrakid · October 4, 2021, 4:28pm

hi,
i configured telegraf to parse some files with grok. it works fine, but when the files are encoded in utf16-le-bom i am not able to parse them:

2021-10-04T16:10:06Z I! Starting Telegraf 1.19.2
2021-10-04T16:10:06Z D! [agent] Initializing plugins
2021-10-04T16:10:06Z D! [agent] Starting service inputs
2021-10-04T16:10:06Z D! Grok no match found for: "T\x00B\x00a\x00t\x00c\x00h\x00 \x00B\x00a\x00t\x00c\x00h\x00\r\x00"
2021-10-04T16:10:06Z D! Grok no match found for: "\x00{\x00\r\x00"
2021-10-04T16:10:06Z D! Grok no match found for: "\x00 \x00 \x00C\x00r\x00e\x00a\x00t\x00i\x00o\x00n\x00T\x00i\x00m\x00e\x00 \x00=\x00 \x00$\x000\x001\x00D\x007\x00B\x004\x00B\x00E\x00A\x00D\x007\x002\x00E\x00B\x00C\x00F\x00\r\x00"
2021-10-04T16:10:06Z D! Grok no match found for: "\x00 \x00 \x00G\x00U\x00I\x00D\x00 \x00=\x00 \x00\\\x007\x00B\x004\x002\x00f\x005\x00b\x000\x002\x004\x00-\x002\x00a\x00d\x00c\x00-\x004\x00b\x007\x007\x00-\x009\x006\x004\x00a\x00-\x003\x00f\x001\x00a\x003\x009\x00d\x004\x008\x001\x005\x006\x00\\\x007\x00D\x00\r\x00"
[...]

when i convert the file to utf8 it can be parsed without any issues.
any idea how to solve it? the files are parsed on windows.

kind regards,
andre

Mya · October 5, 2021, 2:58pm

Hi astrakid, could you please upload your config file and your logs? This will make it easier to see what you are doing. Thank you!

astrakid · October 6, 2021, 2:25pm

this is the config:

[[inputs.file]]
files = ["C:/temp/btch.txt"]
data_format = "grok"

grok_patterns = [
"\\sPageCount\\s=\\s%{NUMBER:pageCount}",
"\\sDocumentCount\\s=\\s%{NUMBER:documentCount}",
"\\sDeclinedPageCount\\s=\\s%{NUMBER:declinedPageCount}",
"\\sDeclinedDocumentCount\\s=\\s%{NUMBER:declinedDocumentCount}",
"\\sDisplayName\\s=\\s%{GREEDYDATA:displayName}",
"\\sBatchClass\\s=\\s%{GREEDYDATA:batchClass}",
"\\sPosition\\s=\\s%{GREEDYDATA:position}",
"\\sState\\s=\\s%{NUMBER:state}",
"\\sStamp_Created\\s=\\s%{DATE:date_stampCreated}",
"\\sLogging\\s=\\s%{GREEDYDATA:logging}",
]

this code works when changing the encoding of the file that is parsed. but i need to parse the file in its original encoding:

with that encoding the debug-output looks like this:

with changed encoding it works and looks like this:

Mya · October 6, 2021, 3:48pm

According to this tread grok treats utf-16 as utf-8 and is known to cause some issues. I think this is a grok issue and not a telegraf issue that can be fixed on our end.

astrakid · October 6, 2021, 4:18pm

according to the mentioned thread it is handled at all in logstash by converting the files to utf8. is something available for telegraf as well?

edit: or any way to cat the file to memory, and then parse the lines by grok?

Mya · October 6, 2021, 5:03pm

Yes! You should be able to use the inputs.file plugin. It has a character_encoding config option. Here is a link to the documentation.

astrakid · October 6, 2021, 5:06pm

yes, thanks, found it already! solved the issue! thx a lot!

Topic		Replies	Views
Logparser - input log file encoding Telegraf telegraf	0	1109	June 10, 2019
Telegraf Input File Plugin does not support grok and json both logs parsing in single logs Telegraf telegraf , grok , json	2	267	October 18, 2023
Telegraf - inputs.logparser problems - Grok no match Telegraf influxdb , telegraf	7	5374	April 26, 2018
Telegraf not parsing even with example grok Telegraf telegraf	4	5809	October 7, 2017
Can't parse ANSI encoded files using inputs.tail plugin Telegraf	2	218	September 22, 2023

Telegraf file plugin with utf-16 le bom and grok not working

Related topics