Allow telegraf to index ip numbers from syslog messages?

I’m using the syslog plugin with telegraf and it receives probably 300-1000 syslogs messages per second.

Most of the influx queries are simple IP number search (fw logs). Could I somehow regex the ip numbers in telegraf directly and add this as some kind of metadata to the syslog messages that is then sent to influxdb v2?

Searching 7 days back for an ip number in _value costs around 15Gb+ Ram and takes maybe 5mins.

Yes, this is possible with a processors plugin, with the following processors plugins this should be possible in principle:

  • processors.regex
  • processors.starlark
  • processors.execd

https://docs.influxdata.com/telegraf/v1.18/plugins/#regex

Thats excellent! Thanks :slight_smile:

So, now I’m trying to find out if I can get all ip numbers found via regex into one tag
for instance a tag could look like: 172.25.32.2,172.25.37.8

Maybe I can just refer to a limited amount of sub groups and hope it catches all ip:s in the message field.

I spent some hours on this and find it really hard, the documentation or examples is quite sparse on this.

So far I have a regex that matches ipv4

pattern = '((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})
replacement = "${1}

I’m feeding it with mockdata using logger:

logger -n localhost -d -P 6514 “access-list randomx denied tcp blabla/34.23.43.10(51315) → bleh/172.16.18.15(80) hit-cnt 1 first hit [0x18449730, 0x0]”

The regex extract should go into ipinfo field, but It seems that I’m doing something incorrect with the matching:

message=“access-list randomx denied tcp blabla/34.23.43.10(51315) → bleh/172.16.18.15(80) hit-cnt 1 first hit [0x18449730, 0x0]”,
ipinfo=“access-list randomx denied tcp blabla/34.23.43.10(51315) → bleh/172.16.18.15(80) hit-cnt 1 first hit [0x18449730, 0x0]”

it seems to extract the whole line matching, I’m just looking for the ip numbers, and all lines will have two ip numbers that I need to extract. The surrounding data might differ depending on input format.

pattern = ‘((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}).*’

gives…

ipinfo=“access-list randomx denied tcp blabla/34.23.43.10”

but I cant remove “access-list randomx denied tcp blabla/”

:frowning:

UPDATE:
Amazing, regex works in mysterious ways… looks like named group worked much better:

pattern = ‘(.?)(?P(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})(.?)(?P(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}).*’
replacement = “{ip1} - {ip2}”

gives…

ipinfo=“34.23.43.10 - 172.16.18.15”

Now we are getting somewhere!

Do you know this website:

I took your message and put in a log file to test the regex processor.
This works - however the regex is quite simple and may fail if the loglines look different.

[[inputs.file]]  # only for debugging
  files = ["regex.log"]
  name_override = "logger"
  data_format = "value"
  data_type = "string"

[[processors.regex]]
  [[processors.regex.fields]]
    key = "value"
    pattern = '.*/([0-9.]+)\(.*/([0-9.]+)\(.*'
    replacement = "${1}"
    result_key = "ip1"
  [[processors.regex.fields]]
    key = "value"
    pattern = '.*/([0-9.]+)\(.*/([0-9.]+)\(.*'
    replacement = "${2}"
    result_key = "ip2"

[[outputs.file]]  # only for debugging
  files = ["regex.out"]
  influx_sort_fields = true

Thanks for your clarification!
Yes I poked around with the regex on that website, but even though it mapped out the IP:s correctly, telegraf handled it a bit differently.
Anyhow, I think the bulk of the thing work now, just needs to battle test it. Gonna be interesting to see how high cardinality the ip addresses will generate.

Also took sometime to figure out how the field → tag converter worked, lots of confusing information here and there on the web.

Here is the whole solution, stuck in between syslog input and influxdb output:

[processors]
[[processors.printer]]
order = 4
[[processors.regex]]
fieldpass = ["message"]
[[processors.regex.fields]]
 order = 1
    key = "message"
    pattern = '(.*?)(?P<ip1>(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})(.*?)(?P<ip2>(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}).*'
    replacement = "${ip1}"
    result_key = "ip1"

[[processors.regex.fields]]
 order = 2
    key = "message"
    pattern = '(.*?)(?P<ip1>(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})(.*?)(?P<ip2>(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}).*'
    replacement = "${ip2}"
    result_key = "ip2"

[[processors.converter]]
order = 3
  [processors.converter.fields]
    tag = [ "ip1"]

[[processors.converter]]
  [processors.converter.fields]
    tag = [ "ip2"]
1 Like