Telegraf logfmt timestamp

sslupsky · March 18, 2019, 7:32pm

I spun up a configuration that uses the tail input plugin configured with the “logfmt” data input format. The input plugin appears to be working as expected. However, I have not been able to figure out how to specify which field contains the time stamp and to provide the timestamp format. Can this be specified with the logfmt input data format? Or, is there a way to use a processor to process the timestamp?

Here is a sample of the logfmt data:

ts=2019-03-18T19:09:56.018 id=agg level=info uid=00010000161994C4 addr=31482 msg="request log success"
ts=2019-03-18T19:09:56.022 id=agg level=info uid=00010000161994C4 addr=31482 msg="device aggregation complete"
ts=2019-03-18T19:09:56.027 id=network level=info addr=0 msg="request status"
ts=2019-03-18T19:09:56.811 id=zigbee level=info uid=0001000013B6AECF addr=0 addr_list="[34422, 4777, 31482]" msg="coordinator children"
ts=2019-03-18T19:13:57.340 id=zigbee level=info uid=0001000013B6AECF addr=0 addr_list="[34422, 4777, 31482]" msg="coordinator children"
ts=2019-03-18T19:13:57.352 id=agg level=info uid_list="['0001000016198841', '00010000167A201B', '00010000161994C4']" msg="devices processed" motes_processed="3"
ts=2019-03-18T19:13:57.355 id=cmd level=info msg="network discovery done"

glinton · March 18, 2019, 8:54pm

There currently isn’t anything in the logfmt parser to specify which field contains a timestamp. It seems to always use the time at processing. If that’s not ideal, you can open a feature request to add an option like json’s json_time_key. Otherwise you should be able to drop the ts tag?

Just to verify, is ts being output as a tag or a field?

sslupsky · March 18, 2019, 8:54pm

I attempted to use a parser processor plugin to do this as follows:

[[processors.parser]]
  namepass = [ "test2" ]
  order = 1
  parse_fields = [ "ts" ]
  drop_original = false
  merge = "override"
  data_format = "grok"
  grok_patterns = ['^%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02T15:04:05.999"}']

However, the time stamp is not being used for the metric. Here is the output:

test2,application_name=logfmt,city=edmonton,country=ca,device_type=gateway,device_version=2v0,host=ip-xxx,hostname=scannet03,id=coord_reset,level=info,organization=scanimetrics,path=/motescan/scannet03/log/ag_history.log,project=P100 msg="Coordinator reset",delay=1i,ts="2019-03-18T20:05:26.229" 1552941714177679876

sslupsky · March 18, 2019, 8:55pm

Hi @glinton ts is a field. I thought using a parser processor would allow me the opportunity to extract the timestamp but it doesn’t seem to do the trick?

sslupsky · March 18, 2019, 11:07pm

@glinton There is a workaround for this but it is kind of ugly. If I reverse the order of things I tried earlier I can almost make it work. That is, use a tail input plugin configured with a grok input data format. I can capture the timestamp from the input and the remainder of the log line is captured into a field (I called it “message”) as follows:

[[inputs.tail]]
  files = ["/motescan/scannet03/log/ag_history.log"]
  name_override = "test3" 
  from_beginning = false
  pipe = false
  watch_method = "inotify"
  data_format = "grok"
  grok_patterns = ['^ts=%{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02T15:04:05.999"} %{GREEDYDATA:message:string}']
  [inputs.tail.tags]
    application_name = "grok"
    device_type = "gateway"
    device_version = "2v0"
    sensor_type = ""
    country = "ca"
    city = "edmonton"
    organization = "scanimetrics"
    project = "P100"
    hostname = "scannet03"

Then, I used a parser processor configured to parse the logfmt data capture in the “message” field as follows:

[[processors.parser]]
  namepass = [ "test3" ]
  order = 1
  parse_fields = [ "message" ]
  drop_original = false
  merge = "override"
  data_format = "logfmt"

Here is the output:

test3,application_name=grok,city=edmonton,country=ca,device_type=gateway,device_version=2v0,host=ip-xxx,hostname=scannet03,id=coord_reset,level=info,organization=scanimetrics,path=/motescan/scannet03/log/ag_history.log,project=P100 message="level=info id=coord_reset msg=\"Coordinator reset\" delay=1",msg="Coordinator reset",delay=1i 1552939526229000000

The unfortunate consequence is that the “message” field passes through to the output. I had a look at the “fielddrop” filter but it appears you cannot use it inside an output. And if I try to use it in the parser processor, nothing passes to the output suggesting the drop happens before the parser processes the field.

Is there another way I can drop the unwanted field?

glinton · March 18, 2019, 11:25pm

I may be incorrect, but I think if you set drop_original=true, the original message field will not get passed through to the output. If that doesn’t work, setting field_drop=["message"] should do the trick. Learn more about field drop here.

sslupsky · March 18, 2019, 11:26pm

Also noted many of the following error messages in telegraf.log:

`2019-03-18T23:05:25Z E! [processors.parser] could not parse field message: logfmt syntax error at pos 44 on line 1: unterminated quoted value`

I did not encounter these errors when when I configured the tail plugin to use the logfmt data input format.

There are a few lines similar to the following in the log. Could these lines be the ones causing the problem with the parser?

`ts=2019-03-18T23:05:29.914 level=info id=zigbee uid=0001000013B6AECF addr=0 msg="config liveMode=0"`

Note the liveMode=0 inside the quoted string.

sslupsky · March 19, 2019, 2:18pm

Hi @glinton

Unfortunately, the drop_original=true parameter drops everything, including tags that I added with the input plugin [inputs.tail.tags].

Regarding fielddrop, the documentation indicates that you cannot use it in the output plugin. Is that correct?

Regarding the error message in the telegraf log, the equals inside the quoted string is not the problem. I think I have found a bug in the grok input data processor. Here is an example:

input:
ts=2019-03-19T13:05:27.679 level=info id=cmdopts nwkwait=480 msg="ok"

output:
test3,application_name=grok,city=edmonton,country=ca,device_type=gateway,device_version=2v0,host=ip-xxx,hostname=scannet03,organization=scanimetrics,path=/motescan/scannet03/log/ag_history.log,project=P100 message="level=info id=cmdopts nwkwait=480 msg=\"ok" 1553000727679000000

Note the quoted string value for the “msg” field in the input line. Also note the position of this field in the input line and particularly the double quote at the end of the line.

In the output line there are two issues. First, the “message” field is missing the second escaped double quote that goes with the “msg” field value. Second, the “message” field has not been parsed by the logfmt parser. I now believe this is what is throwing the logfmt syntax error message in the telegraf log (unterminated quoted value). Accordingly, the parser processor ignores the line which explains why the parsed fields and tags do not appear in the output.

I believe I have tracked this problem down and isolated it to the log lines that end with a double quote. Thus, the lines that end with msg=“some text” are mangled and are missing the second escaped double quote. I have confirmed that any line from the input that ends without a double quote is properly parsed by the grok input data processor and passed along to the logfmt parser processor. This appears to be a bug and I think it is with the grok input data format.

sslupsky · March 19, 2019, 2:31pm

This problem appears to be the same as reported in issue 4998

github.com/influxdata/telegraf

grok GREEDYDATA not greedy, or last log line symbol dissappear (tail plugin at least)

opened 05:17PM - 16 Nov 18 UTC

closed 09:02PM - 22 Mar 19 UTC

andyceo

bug area/tail

### Relevant telegraf.conf: [agent] interval = "1s" round_i…nterval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "1s" flush_jitter = "0s" precision = "" debug = false quiet = false logfile = "/tmp/telegraf.log" [[outputs.file]] files = ["stdout"] data_format = "influx" [[inputs.tail]] name_override = "logs" files = [ "/tmp/test.log", ] from_beginning = true pipe = false watch_method = "inotify" data_format = "grok" grok_patterns = ["%{GREEDYDATA:log_line:string}"] ### System info: - Ubuntu 4.4.0-139-generic #165-Ubuntu SMP Wed Oct 24 10:58:50 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux - Telegraf 1.8.3 (git: HEAD f2979106) ### Steps to reproduce: 1. Create /tmp/telegraf.conf file given earlier 2. Create /tmp/test.log with following data: 192.168.2.219 - - [19/Aug/2018:15:38:07 +0000] "GET / HTTP/1.1" 200 117 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0" 10.0.6.29 - - [19/Aug/2018:17:44:40 +0000] "GET / HTTP/1.1" 200 119 "-" "Go-http-client/1.1" "This is the string" And this is too! 'Another one' 3. Execute telegraf with `telegraf --config /tmp/telegraf.conf` Output would be something like this: logs,host=newhope,path=/tmp/test.log log_line="192.168.2.219 - - [19/Aug/2018:15:38:07 +0000] \"GET / HTTP/1.1\" 200 117 \"-\" \"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0" 1542388327005459777 logs,host=newhope,path=/tmp/test.log log_line="10.0.6.29 - - [19/Aug/2018:17:44:40 +0000] \"GET / HTTP/1.1\" 200 119 \"-\" \"Go-http-client/1.1" 1542388327005495981 logs,host=newhope,path=/tmp/test.log log_line="This is the string" 1542388327005520572 logs,host=newhope,path=/tmp/test.log log_line="And this is too!" 1542388327005532212 logs,host=newhope,path=/tmp/test.log log_line="'Another one'" 1542388327005593750 ### Expected behavior: Every line must parsed and returned as-is. ### Actual behavior: As you can see, in some lines, last symbol, the double quote in given example, was mystically eaten. For the line entirely in double quotes, both leading and tailing quotes are gone. String in single quotes and without any quotes, was parsed as expexted. ### Additional info: If I use `%{HTTPD_COMBINEDLOG}` instead of `%{GREEDYDATA:log_line:string}` in grok patterns, every correct Nginx combined log parsed ok, as expected.

Topic		Replies	Views
Inputs Tail (Grok) - Would This Configuration Work? Telegraf telegraf	12	1623	March 5, 2021
Understand tail input plugin telegraf Telegraf telegraf , tail	4	1356	July 17, 2024
Grok-parsing application logs and outputting to a syslog measurement loses timestamp Telegraf	0	501	April 30, 2020
Help with grok from a log with date MM/dd/yyyy hh:mm:ss AM/PM Telegraf	21	8582	June 19, 2019
Configure timezone grok parser telegraf	0	480	April 16, 2020

Telegraf logfmt timestamp

Related topics