Logstash to influxdb2 aggregate datapoints issue

Indeed_1 · November 23, 2023, 4:16am

Hi
I have lots of log lines like this in exact same time, when i try to use logstash to pars and send to influxdb2, influx or ligstash aggregates some lines!
e.g here is the sample lines that aggregate is I[847676]

2023-11-20 14:05:49:787 INFO T[SHR1I7783] APP R[GW] I[675940] U[461301]
2023-11-20 14:05:49:787 INFO T[SHR5I7787] APP R[GW] I[847676] U[432408]
2023-11-20 14:05:49:787 INFO T[SHR5I7787] APP R[GW] I[847676] U[935351]
2023-11-20 14:05:49:787 INFO T[SHR5I7787] APP R[GW] I[847676] U[533755]
2023-11-20 14:05:49:787 INFO T[SHR0I7782] APP R[GW] I[50888] U[600770]
2023-11-20 14:05:49:787 INFO T[SHR5I7787] APP S[GW] I[847676] U[432409]

1-I try to set I as field but still aggregate them.
2-try to set I as tag, it work but it will produce performance issue due high cardinality.
3-try to use uid, for each even still aggregate them.
4-U is quietly unique field, i try to use that but still aggregate datapoints.
5-if i want use telegraf instead of logstash what is the correct configuration that work with patter that i mentioned?

FYI1: don’t want use metric because it is important to group by T,I,R,S
FYI2: R means received, S means send

Any idea?
Thanks

FixTestRepeat · November 23, 2023, 7:53pm

Can you share your config you have at present?

Indeed_1 · November 23, 2023, 8:07pm

@FixTestRepeat sure, here it is:

input {
file {
path => “/home/app/logs/2023.log”
start_position => “beginning”
sincedb_path => “/dev/null”
exclude => [“.gz" , ".bz2” , “*.slice” ]
codec => plain { charset => “UTF-8” }
}
}

filter {

mutate
{
    replace => { "host" => "${HOSTNAME}"}

    replace => { "IP_INFLUX" => "192.168.1.1"}
    replace => { "BUCKET_INFLUX" => "mybucket"}
    replace => { "TOKEN_INFLUX" =>  "mytoken"}
    replace => { "ORG_INFLUX" => "myorg"}
}

mutate
{
    replace => { "URL" => "http://%{[IP_INFLUX]}:8087/api/v2/write?bucket=%{[BUCKET_INFLUX]}&precision=s&org=%{[ORG_INFLUX]}"}
}

grok {
  match => { "message" => "%{NOTSPACE} %{NOTSPACE} %{NOTSPACE} %{WORD:module} %{DATA:direction} %{GREEDYDATA}" }
}

if [direction] == "R[GW]" {

grok {
  match => { "message" => "^%{TIMESTAMP_ISO8601:timestamp} INFO T\[%{DATA:Thread}\] APP R\[%{DATA:R}\] I\[%{NONNEGINT:I}\] U\[%{NONNEGINT:uid}\]" }
}

  mutate {
    add_tag => ["GW_In"]

  }
}
if [direction] == "S[GW]" {

grok {
  match => { "message" => "^%{TIMESTAMP_ISO8601:timestamp} INFO T\[%{DATA:Thread}\] APP S\[%{DATA:S}\] I\[%{NONNEGINT:I}\] U\[%{NONNEGINT:uid}\]" }
}

  mutate {
    add_tag => ["GW_Out"]
  }
}

}

output
{
if “GW_In” in [tags]

http {
url => “%{[URL]}”
http_method => “post”
format => message
message => ‘APP_In,Thread=%{[Thread]},host=%{[host]},R=%{[R]},I=%{[I]} uid=“%{[uid]}”’

headers => [
  'Authorization', 'Token %{[TOKEN_INFLUX]}'
]

}

if "GW_Out" in [tags]

{

http {
url => “%{[URL]}”
http_method => “post”
format => message
message => ‘APP_Out,Thread=%{[Thread]},host=%{[host]},S=%{[S]},I=%{[I]} uid=“%{[uid]}”’

headers => [
  'Authorization', 'Token %{[TOKEN_INFLUX]}'
]

}

FixTestRepeat · November 24, 2023, 1:38am

Hmm if you are open to this and bypassing logstash completely, that might be the best way to go. (I can’t see any logstash to influxdb plugins that appear to be maintained currently.)

Looks like the telegraf tail plugin + grok filter is the current best method in the influx ecosystem. Take a look here.

github.com

influxdata/telegraf/blob/master/plugins/inputs/tail/README.md

# Tail Input Plugin

The tail plugin "tails" a logfile and parses each log message.

By default, the tail plugin acts like the following unix tail command:

```shell
tail -F --lines=0 myfile.log
```

- `-F` means that it will follow the _name_ of the given file, so
that it will be compatible with log-rotated files, and that it will retry on
inaccessible files.
- `--lines=0` means that it will start at the end of the file (unless
the `from_beginning` option is set).

see <http://man7.org/linux/man-pages/man1/tail.1.html> for more details.

The plugin expects messages in one of the [Telegraf Input Data
Formats](../../../docs/DATA_FORMATS_INPUT.md).

This file has been truncated. show original

Anaisdg · November 27, 2023, 7:50pm

Thank you @FixTestRepeat

Topic		Replies	Views
Only one value perf second recorded - logstash to influxdb Telegraf influxdb	1	821	March 14, 2018
Poor write performance to influxd - events getting dropped Telegraf	3	1464	June 29, 2018
Possibility to store the data into influxDB through telegraf based on condition like if timestamp and level is same we need to restrict storing into InfluxDB Telegraf telegraf	3	485	July 8, 2019
Telegraf Logparser and duplicate data points Store influxdb , telegraf	9	3067	January 5, 2018
Telegraf log parser ---> Influxdb duplicates values Telegraf	14	2504	February 27, 2019

Logstash to influxdb2 aggregate datapoints issue

Related topics