Telegraf file-input custom grok pattern timestamp and line breaks

hi,
i am trying to work with grok, but i am failing.
my need:
parse frequently a file, which is written newly (so no “tail” needed).
within this file i use grok pattern to extract informations.

the file is structured like this:

TBatch Batch
{
  CreationTime = $01D1231234543210
  GUID = \7B123a45b6-c78d-90e1-2fab-c345de6fa789\7D
  Version = 3.4.5
  PageCount = 12
  DocumentCount = 3
  DeclinedPageCount = 0
  DeclinedDocumentCount = 0
  Documents = 1,2,3
  DisplayName = NAME-OF-TYPE-TIMESTAMP
  Priority = 2
  BatchClass = ID-OF-BATCH
  Position = ID-POSITION
  State = 2
  Stamp_Created = 24.09.2021|14:59:35|ID_BATCH|HOSTNAME|PROCUSER
}

i have two questions:

  1. how would it be possible to get the linebreaks removed to get all information in one? with several grok patterns i get several data which will be inserted to the database (influxdb).
  2. i have issues in getting the timestamp converted in to unix_timestamp. I tried using this custom grok pattern:
MYTS %{DATE_EU}.%{TIME}

and this grpk_pattern:

"\\sStamp_Created\\s=\\s%{MYTS:mytimestamp}.ID.*"

output:
mytimestamp=“04.10.2021|17:33:35”

when i try to convert that to another timestamp, i get an error message:

Error parsing timestamp [04.10.2021|17:33:35], could not find any suitable time layouts.

help appreciated,
kidn regards,
andre

Hello @astrakid,
Your timestamp needs to be in one of the following formats:

  • Timestamp modifiers:
  • ts (This will auto-learn the timestamp format)
  • ts-ansic (“Mon Jan _2 15:04:05 2006”)
  • ts-unix (“Mon Jan _2 15:04:05 MST 2006”)
  • ts-ruby (“Mon Jan 02 15:04:05 -0700 2006”)
  • ts-rfc822 (“02 Jan 06 15:04 MST”)
  • ts-rfc822z (“02 Jan 06 15:04 -0700”)
  • ts-rfc850 (“Monday, 02-Jan-06 15:04:05 MST”)
  • ts-rfc1123 (“Mon, 02 Jan 2006 15:04:05 MST”)
  • ts-rfc1123z (“Mon, 02 Jan 2006 15:04:05 -0700”)
  • ts-rfc3339 (“2006-01-02T15:04:05Z07:00”)
  • ts-rfc3339nano (“2006-01-02T15:04:05.999999999Z07:00”)
  • ts-httpd (“02/Jan/2006:15:04:05 -0700”)
  • ts-epoch (seconds since unix epoch, may contain decimal)
  • ts-epochnano (nanoseconds since unix epoch)
  • ts-epochmilli (milliseconds since unix epoch)
  • ts-syslog (“Jan 02 15:04:05”, parsed time is set to the current year)
  • ts-“CUSTOM”
    telegraf/plugins/parsers/grok at master · influxdata/telegraf · GitHub
    I don’t think that timestamp will work. You could use an execd processor plugin to convert. That might also help you with your new line problem too.
    Tagging a telegraf expert for advice @Mya thank you!
1 Like

thanks for involving @Mya . i am a little bit lost here.

1 Like

Hi @astrakid, I am sorry to hear you are having a hard time with this. Will you please post your config file so I can run it and mess with it on my end? Have you tried taking a look at this thread?

1 Like

hi @Mya ,

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "1s"
  flush_interval = "10s"
  flush_jitter = "2s"
  precision = "s"
  hostname = ""
  omit_hostname = false

[[outputs.influxdb]]
   urls = ["***:8086"] # masked
  database = "test"
  timeout = "5s"

[[inputs.filecount]]
 interval = "60s"
        directories = ["/opt/telegraf-test/test"]
    name = "file.txt"

  [[inputs.file]]
    interval = "60s"
        files = ["/opt/telegraf-test/test/*/file.txt"]
        data_format = "grok"
        character_encoding = "utf-16le"

        grok_patterns = [
        "\\sPageCount\\s=\\s%{NUMBER:pageCount:int}\\sDocumentCount\\s=\\s%{NUMBER:documentCount:int}",
        "\\sDeclinedPageCount\\s=\\s%{NUMBER:declinedPageCount:int}",
        "\\sDeclinedDocumentCount\\s=\\s%{NUMBER:declinedDocumentCount:int}",
        "\\sDisplayName\\s=\\s%{DATA:displayName}",
        "\\sBatchClass\\s=\\s%{GREEDYDATA:batchClass}",
        "\\sPosition\\s=\\s%{GREEDYDATA:position}",
        "\\sState\\s=\\s%{NUMBER:state}",
        "\\sStamp_Created\\s=\\s%{AGROTIMESTAMP:agrotimestamp:ts}.ID.*",
        "\\sLogging\\s=\\s%{GREEDYDATA:logging}",
        ]

        grok_custom_patterns = '''
    AGROTIMESTAMP %{DATE_EU}.%{TIME}
'''
1 Like

Hi @astrakid, please try changing your grok_patterns this

grok_patterns = [
          'PageCount = %{NUMBER:pageCount:int}',
          'DocumentCount = %{NUMBER:documentCount:int}',
          'DeclinedPageCount = %{NUMBER:declinedPageCount:int}',
          'DeclinedDocumentCount = %{NUMBER:declinedDocumentCount:int}',
          'DisplayName = %{DATA:displayName}',
          'BatchClass = %{GREEDYDATA:batchClass}',
          'Position = %{GREEDYDATA:position}',
          'State = %{NUMBER:state}',
          'Stamp_Created = %{AGROTIMESTAMP:agrotimestamp:ts-"02.01.2006|15:04:05"}.ID.*',
          'Logging = %{GREEDYDATA:logging}',
        ]

Let me know if this works for you :slight_smile:

1 Like

unfortunately not. I neither get “no match found” nor a hit. It seems to be ignored. Might due to the “|” in the string? Is it necessary too escape the pipe?

regards,
andre

@astrakid This is a grok formatting question at this point. I would suggest trying out different paterns using something like this online grok tool until you get what you are after.

For example if your timestamp field looked like:

Stamp_Created = 2017-03-11T19:23:34.000+00:00

the grok pattern would be:

Stamp_Created = %{TIMESTAMP_ISO8601:timestamp}

In your case, you have an entirely custom format. Therefore, you need a custom pattern. Here is from your original example:

Stamp_Created = 24.09.2021|14:59:35|ID_BATCH|HOSTNAME|PROCUSER

I found the following grok pattern parses the fields:

Stamp_Created = %{MONTHDAY}.%{MONTHNUM}.%{YEAR:year}\|%{TIME:time}

and returns:

{
  "MONTHNUM": "09",
  "year": "2021",
  "HOUR": "14",
  "MINUTE": "59",
  "SECOND": "35",
  "time": "14:59:35",
  "MONTHDAY": "24"
}
1 Like

yes, that goes into the right direction.
i only get the keys for explicit mentioned variables (in this case “year” and “time”).
when i add variable names to MONTHDAY and MONTHNUM, I get those as well. ok, i can handle that.

maybe you can guide me how to convert thiese infos into a simple timestamp for influxdb now? i want to get oldest and youngest information. let me explain:

for current timestamp 29.10.2021 07:47 telegraf is getting information from the files, e.g.:
agrotimestamp=27.10.2021 21:10

i want to have this information in influxdb-datapoint with current timestamp and as further information the “agrotimestamp”. when i have different agrotimestamps at current timestamp in influxdb, i want to show the youngest and oldest agrotimestamps for this point.

kind regards and thanks for solving the main issue here!