Got big trouble in grok patterns

telegraf

#1

Hello everybody,
it does 3 days i’m trying to collect datas from logs files but i’m not able to match with my grok patterns.
My logs file is as this, with comma separated:
2018-01-19 07:00:17;vol_name;VPD83T3:6000144000000010607d2411e815bb92;0,1;0,4;0;148;0

i would like to parse each lines with each values and send it to influxDB with a name (or tag ?) for each value like (each value with his field name in influxDB)
date 2018-01-19 07:00:17
vol_name my_vol_name
ID VPD83T3
etc, etc.

i would be very happy if someone could help me or give me a hand

just for th date format, i’ve tried differnts syntax, custom patterns, but it never match…

thanks in advance for your help


#2

here the line i want to parse:
2018-01-19 15:29:17;VV_DPM50001D0_HLU05;VPD83T3:6000144000000010607d2411e815c393;0,0167;0;0,0167;0;608

Here the pattern i try, (in first time, just extract date and 2 first values):
patterns = [’%{TIMESTAMP_ISO8601:timestamp:ts-“2006-01-02 15:04:05”},%{WORD:lun},%{WORD:ID}’]

i expect extract these values:
2018-01-19 15:29:17;VV_DPM50001D0_HLU05;VPD83T3

Here the error i get:
2018-11-30T13:11:31Z D! Grok no match found for: "2018-01-19 15:29:17;VV_DPM50001D0_HLU05;VPD83T3:6000144000000010607d2411e815c393;0,0167;0;0,0167;0;608
"
Thanks a lot in advance four your help


#3

You’re getting errors because you’re trying to match commas where there are semicolons. Check out https://grokdebug.herokuapp.com/ to test your pattern.

Input: 2018-01-19 15:29:17;VV_DPM50001D0_HLU05;VPD83T3:6000144000000010607d2411e815c393;0,0167;0;0,0167;0;608
Pattern: %{TIMESTAMP_ISO8601:timestamp};%{WORD:lun};%{DATA:id};

Check the “Named Captures Only” option


#4

Thanks a lot for your fast answer
i noticed my mistake and could start to gather data.
I used the link you provide to test my patterns and i could extract data
I’m going to continue to try to gather the rest of the data.

thanks for your help


#5

Ok, it’s become better but now i got new trouble…
On each log line, i have these data:
2018-01-19 07:00:17;VV_CLUSTER_PTC_SHP_HLU00;VPD83T3:6000144000000010607d2411e815bc6b;0,1;0,667;0,133;131;843

And some data are decimal (in bold)

I can extract the begining with this pattern:
patterns = [’%{TIMESTAMP_ISO8601:timestamp:ts-“2006-01-02 15:04:05”};%{WORD:VirtualVolume};%{VPID:vpid}’]
custom_patterns = ‘’’
VPID %{HOSTNAME:vpd}:%{BASE10NUM:id}
‘’’
And i can get data in grafana

But for the decimal, i have tried all number format but Nothing works

I think i must do perhaps a custom pattern with regex or another method but in this case, i’m totally lost

Thanks in advance for your help

@glinton
I have checked the named_captures_only option as you say, but i can’t find it in the telegraf.conf, the only reference is in elasticsearch but i don’t use it.


#6

I’ve changed my pattern by this:
%{TIMESTAMP_ISO8601:timestamp:ts-“2006-01-02 15:04:05”};%{WORD:VirtualVolume};%{VPID:vpid};%{GREEDYDATA:FeLuOpsCountByS};%{GREEDYDATA:FeLuReadKbs};%{GREEDYDATA:FeluReadWriteKbs};%{GREEDYDATA:FeLuReadLatRecentAverage};%{GREEDYDATA:FeLuWritedLatRecentAverage}

It seems to work…


#7

I don’t stop to try to extract but i always get trouble…

for example, i have this log line:

2018-01-19 07:00:17;VV_CLUSTER_PTC_SHP_HLU00;VPD83T3:6000144000000010607d2411e815bc6b;0,1;0,667;0,133;131;843

I parse it with this pattern

patterns = ['%{DATEFORM:date};%{DATA:VirtualVolume:tag};%{VPID:vpid:string};%{DATA:FeLuOpsCountByS};%{DATA:FeLuReadKbs};%{DATA:FeluReadWriteKbs};%{INT:FeLuReadLatRecentAverage};%{INT:FeLuWritedLatRecentAverage}']
custom_patterns = '''
DATEFORM %{TIMESTAMP_ISO8601:date:ts-"2006-01-02 15:04:05"}
VPID %{HOSTNAME:vpd}:%{WORD:id}
'''

i have this result:

{
  "time": [
    [
      "2018-01-19 07:00:17"
    ]
  ],
  "timestamp": [
    [
      "2018-01-19 07:00:17"
    ]
  ],
  "YEAR": [
    [
      "2018"
    ]
  ],
  "MONTHNUM": [
    [
      "01"
    ]
  ],
  "MONTHDAY": [
    [
      "19"
    ]
  ],
  "HOUR": [
    [
      "07",
      null
    ]
  ],
  "MINUTE": [
    [
      "00",
      null
    ]
  ],
  "SECOND": [
    [
      "17"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ],
  "VirtualVolume": [
    [
      "VV_CLUSTER_PTC_SHP_HLU00"
    ]
  ],
  "vpid": [
    [
      "VPD83T3:6000144000000010607d2411e815bc6b"
    ]
  ],
  "vpd": [
    [
      "VPD83T3"
    ]
  ],
  "id": [
    [
      "6000144000000010607d2411e815bc6b"
    ]
  ],
  "FeLuOpsCountByS": [
    [
      "0,1"
    ]
  ],
  "FeLuReadKbs": [
    [
      "0,667"
    ]
  ],
  "FeluReadWriteKbs": [
    [
      "0,133"
    ]
  ],
  "FeLuReadLatRecentAverage": [
    [
      "131"
    ]
  ],
  "FeLuWritedLatRecentAverage": [
    [
      "843"
    ]
  ]
}

So i can see the data in grafana but i think there is Something Strange and flase, unable to create graph I use it a lot with MySQL, PostgreSQL for graph netapp and avamar without troubles but this side logparser -> influxdb -> grafana isn’t clear

I’m totally lost

If someone could help m or give me advice

Thanks in advance


#8

I come back with my progression and share my experience.

the real problem was the “,” (comma) for the decimal number in the log file.
it is possible to extract them with this custom_pattern:
BASE10NUMCOMMA (?<![0-9,+-])(?>[+-]?(?:(?:[0-9]+(?:\,[0-9]+)?)|(?:\,[0-9]+)))

i didn’t create this custom_pattern, i used the BASE10NUM existing pattern and replaced de “.” (dot) by the “,” (comma) :sweat_smile:
And i could extract the datas from this log line:
2018-01-19 07:00:17;VV_TU_VENUS_P11_HLU00;VPD83T3:6000144000000010607d2411e815bb92;0,1;0,4;0;148;0
With this pattern:

%{DATEFORM:date};%{DATA:VirtualVolume:tag};%{VPID:vpid:string};%{BASE10NUMCOMMA :FeLuOpsCountByS};%{BASE10NUMCOMMA:FeLuReadKbs};%{BASE10NUMCOMMA:FeluReadWriteKbs};%{BASE10NUMCOMMA:FeLuReadLatRecentAverage};%{BASE10NUMCOMMA:FeLuWritedLatRecentAverage}

And the custom patterns associated:

DATEFORM %{TIMESTAMP_ISO8601:timestamp:ts-"2006-01-02 15:04:05"}
VPID %{DATA:vpd}:%{DATA:id}
BASE10NUMCOMMA (?<![0-9,+-])(?>[+-]?(?:(?:[0-9]+(?:\,[0-9]+)?)|(?:\,[0-9]+)))

I could get the right result:

{
  "date": [
    [
      "2018-01-19 07:00:17"
    ]
  ],
  "timestamp": [
    [
      "2018-01-19 07:00:17"
    ]
  ],
  "YEAR": [
    [
      "2018"
    ]
  ],
  "MONTHNUM": [
    [
      "01"
    ]
  ],
  "MONTHDAY": [
    [
      "19"
    ]
  ],
  "HOUR": [
    [
      "07",
      null
    ]
  ],
  "MINUTE": [
    [
      "00",
      null
    ]
  ],
  "SECOND": [
    [
      "17"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ],
  "VirtualVolume": [
    [
      "VV_TU_VENUS_P11_HLU00"
    ]
  ],
  "vpid": [
    [
      "VPD83T3:6000144000000010607d2411e815bb92"
    ]
  ],
  "vpd": [
    [
      "VPD83T3"
    ]
  ],
  "id": [
    [
      "6000144000000010607d2411e815bb92"
    ]
  ],
  "BASE10NUMCOMMA": [
    [
      "0,1"
    ]
  ],
  "FeLuReadKbs": [
    [
      "0,4"
    ]
  ],
  "FeluReadWriteKbs": [
    [
      "0"
    ]
  ],
  "FeLuReadLatRecentAverage": [
    [
      "148"
    ]
  ],
  "FeLuWritedLatRecentAverage": [
    [
      "0"
    ]
  ]
}

with the patterns as they are, if i don’t use modifiers, all the data are considered as “string” in the influxDB.

Check it in the influxdb shell:

#> show field keys on vplex
name: virtual_volume_log
fieldKey                   fieldType
--------                   ---------
FeLuOpsCountByS            string
FeLuReadKbs                string
FeLuReadLatRecentAverage   string
FeLuWritedLatRecentAverage string
FeluReadWriteKbs           string
VirtualVolume              string
date                       string
id                         string
vpd                        string
vpid                       string

So all datas are unusable in grafana, unable to graph, convert, etc. i’m able just to print as table the contents of the influxdb database.

When trying to use modifiers and convert the datas to float when injecting in influxdb, i have parsing and convert errors, it seems that influxdb doesn’t understand “,” (comma) decimal.

So i have replace “,” (comma) by “.” (dot) in my log file to get lines as this:
2018-01-19 07:00:17;VV_TU_VENUS_P11_HLU00;VPD83T3:6000144000000010607d2411e815bb92;0.1;0.4;0;148;0

And i have personnalize my pattern with the BASE10NUM pattern core and add modifier “float” to convert data as float when injecting in influxDB:

patterns = ['%{DATEFORM:date};%{DATA:VirtualVolume:tag};%{VPID:vpid:string};%{BASE10NUM:FeLuOpsCountByS:float};%{BASE10NUM:FeLuReadKbs:float};%{BASE10NUM:FeluReadWriteKbs:float};%{BASE10NUM:FeLuReadLatRecentAverage:float};%{BASE10NUM:FeLuWritedLatRecentAverage:float}']
custom_patterns = '''
DATEFORM %{TIMESTAMP_ISO8601:date:ts-"2006-01-02 15:04:05"}
VPID %{HOSTNAME:vpd}:%{WORD:id}
'''

And i could get float fileds in influxDB:

#> show field keys on vplexdoted
name: virtual_volume_log
fieldKey                   fieldType
--------                   ---------
FeLuOpsCountByS            float
FeLuReadKbs                float
FeLuReadLatRecentAverage   float
FeLuWritedLatRecentAverage float
FeluReadWriteKbs           float
date                       string
id                         string
vpd                        string
vpid                       string

But with this configuration, Graph in grafana seems doesn’t work…
i can extract datas in grafana as table, but as soon as i try to graph datas with graph panel, Nothing appear.
Forever i can fin the, tag, the measurements, etc.

Thanks in advance


#9

I continue my monologue because nobody seems concerned by this thread.

there is another thing i try to understand.
When datas are collecting and inserted in influxDB, i extract date.

All seems to work but when i use Grafana, it creates his own time column

  1. time column: automatic time column created when data are inserted in influxdb
  2. timestamp column: my extracted data ans the real time in log file

the probléme is when i try to do time series queries in grafana, it just allows me to use the time column and the graph doesn’t work beacause all the lines have the same time, this one when datas are inserted.

What do i do wrong ?

Thanks in advance for your help and advice


#10

Remove the date from your custom patterns and use it in your patterns. Eg:

patterns = ['%{TIMESTAMP_ISO8601:date:ts-"2006-01-02 15:04:05"};%{DATA:VirtualVolume:tag};%{VPID:vpid:string};%{BASE10NUM:FeLuOpsCountByS:float};%{BASE10NUM:FeLuReadKbs:float};%{BASE10NUM:FeluReadWriteKbs:float};%{BASE10NUM:FeLuReadLatRecentAverage:float};%{BASE10NUM:FeLuWritedLatRecentAverage:float}']
custom_patterns = '''
VPID %{HOSTNAME:vpd}:%{WORD:id}
'''

#11

Hello Glinton and thanks a lot for your answer !
That i have understand is that time in an influxdb metada recorded when data is injected in influxdb, so this can be good if i parse a log file that is continuously modified, i have to change the from beginning parameters to “false” to get the newly recorded data and time will be recorded when the new lines are injected in Grafana.
I don’t know if i am wrong on this point.

So if i follow your advice, you think that the date will be a grafana value that i will can exploit instead of the indexed metadata time value ?

Thanks a lot for your help