Error when parsing a url with [[processors.regex]]

Hello

I use the inputs.http plugin to recover data in json format as below :

[[inputs.http]]
  urls = [
    "https://www.mon_url.fr:4848/monitoring/domain/instance-production-1/applications/DistributeurWS/server/DistributeurWSService.json",
    "https://www.mon_url.fr:4848/monitoring/domain/instance-production-2/applications/DistributeurWS/server/DistributeurWSService.json",
    "https://www.mon_url.fr:4848/monitoring/domain/instance-production-1/applications/DistributeurWS/server/EspeceWSService.json",
    "https://www.mon_url.fr:4848/monitoring/domain/instance-production-2/applications/DistributeurWS/server/EspeceWSService.json",
    "https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1/applications/DistributeurWS/server/DistributeurWSService.json",
    "https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1/applications/DistributeurWS/server/EspeceWSService.json",
  ]
  data_format = "json
  method = "GET"
  fieldpass = [
    "extraProperties_entity_requestcount_count",
    "extraProperties_entity_errorcount_count"
  ]

I retrieve the url of the data stream in the url tag. I’ve been trying for some time to decompose the url into two other tags: one containing the instance name (e.g. instance-production-1) and the other containing the service name (e.g. EspeceWSService).
.
I use the processors.regex plugin to parse the url as below :

[[processors.regex]]
  order = 1
  [[processors.regex.tags]]
    key = "url"
    pattern = "^https?\:\/\/([A-Za-z0-9\-\_\.\:]*\/)*(instance-[A-Za-z0-9\-]*)([A-Za-z0-9\-\_\.\:]*\/)*([A-Za-z\-]*.json)$"
    replacement = "${2}".
    result_key = "instance

[[processors.regex]]
  order = 2
  [[processors.regex.tags]]
    key = "url"
    pattern = "^https?\:\/\/([A-Za-z0-9\-\_\.\:]*\/)*(instance-[A-Za-z0-9\-]*)([A-Za-z0-9\-\_\.\:]*\/)*([A-Za-z\-]*.json)$"
    replacement = "${4}".
    result_key = "webservice

With this configuration, I can’t create the instance or webservice tags but, moreover, the telegraf service crashes for a reason I don’t understand

â—Ź telegraf.service - The plugin-driven server agent for reporting metrics into InfluxDB
   Loaded: loaded (/usr/lib/systemd/system/telegraf.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since mer. 2020-11-18 13:49:57 CET; 3s ago
     Docs: https://github.com/influxdata/telegraf
  Process: 1699 ExecStart=/usr/bin/telegraf -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d $TELEGRAF_OPTS (code=exited, status=1/FAILURE)
 Main PID: 1699 (code=exited, status=1/FAILURE)

Nov. 18 13:49:57 www.mon_url.fr systemd [1]: Unit telegraf.service entered failed state.
Nov. 18 13:49:57 www.mon_url.fr systemd[1]: telegraf.service failed.
Nov. 18 13:49:57 www.mon_url.fr systemd[1]: telegraf.service holdoff time over, scheduling restart.
Nov. 18 13:49:57 www.mon_url.fr systemd[1]: Stopped The plugin-driven server agent for reporting metrics into InfluxDB.
Nov. 18 13:49:57 www.mon_url.fr systemd[1]: start request repeated too quickly for telegraf.service
Nov. 18 13:49:57 www.mon_url.fr systemd [1]: Failed to start The plugin-driven server agent for reporting metrics into InfluxDB.
Nov. 18 13:49:57 www.mon_url.fr systemd[1]: Unit telegraf.service entered failed state.
Nov. 18 13:49:57 www.mon_url.fr systemd[1]: telegraf.service failed.

What do you think I’m doing wrong?
.

Thanking you in advance
Thierry

(Warning, this is only a temporary solution until the [[processors.regex]] plugin works)


hello to all,
I managed to work around the problem of regular expression processing by the [[processors.regex]] plugin by using various other plugins … but I’d like to know why it doesn’t work because I couldn’t always work around it.

Here’s how I temporarily fixed this bug :

# processes the url of a webservice
# eg: https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1/applications/EspeceWS/server/EspeceWSService.json
[[processors.filepath]]
  order = 1
  # retrieves the name of the ws without extension
  # eg: WSService
  [[processors.filepath.stem]]
    tag = "url"
    dest = "webservice"

[[processors.filepath]]
  order = 2
  # get the url without the service name
  # eg: https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1/applications/EspeceWS/server
  [[processors.filepath.dirname]]
    tag = "url"
    dest = "url"
  # retrieves the parent folder
  # eg: https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1/applications/EspeceWS
  [[processors.filepath.dirname]]
    tag = "url"
    dest = "url"
  # retrieves the parent folder
  # eg: https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1/applications
  [[processors.filepath.dirname]]
    tag = "url"
    dest = "url"
  # retrieves the parent folder
  # eg: https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1
  [[processors.filepath.dirname]]
    tag = "url"
    dest = "url"

# transforms the folder url into a file url
# eg: https://www.mon_url.fr:4848/monitoring/domain/instance-recette-1.json
[[processors.template]]
    order = 3
    tag = "url"
    template = '{{ .Tag "url" }}}.json'.

# retrieves the name of the instance without extension
# eg: recipient-instance-1
[[processors.filepath]]
  order = 4
  [[processors.filepath.stem]]
    tag = "url"
    dest = "url"

# renames the url tag to " instance "
[[processors.rename]]
  order = 5
  [[processors.rename.replace]]
    tag = "url"
    dest = "instance"

I remain open to any help to solve the problem of parsing my urls with the [[processors.regex]] plugin and regular expressions.
.
thank you
Thierry

1 Like