Telegraf log parser ---> Influxdb duplicates values


#1

I use log parser telegraf for push matric to influxdb
but in influxdb im look duplicates matric/value, please give me solution for my problem. Thanks


#2

Hey @alvianno,

I’d love to help, but I’m going to need a little bit more information please.

Can you tell us what metrics you are pushing to InfluxDB and how they’re getting there?

Are you using Telegraf with a plugin or the client libraries within code? Can you share examples?

The more information you provide, the more we can help.

Thanks


#3

Config Telegraf plugin log parser :

example Log .csv :
MOBILE,Payment,response_time,1547805960,4624
MOBILE,Payment,count,1547805960,181
MOBILE,Payment,error,1547805960,14
WEB,Payment,response_time,1547805960,67
WEB,Payment,count,1547805960,1
WEB,Payment,error,1547805960,0
WEB,Login,response_time,1547805960,1295
WEB,Login,count,1547805960,82
WEB,Login,error,1547805960,1
WEB,Emoney,response_time,1547805960,0

Telegraf Config :


#4

Hey,

In the sample CSV provided and in the screenshot from your original post, there are no duplicate points:

MOBILE,Payment,response_time,1547805960,4624
MOBILE,Payment,count,1547805960,181
MOBILE,Payment,error,1547805960,14
WEB,Payment,response_time,1547805960,67
WEB,Payment,count,1547805960,1
WEB,Payment,error,1547805960,0
WEB,Login,response_time,1547805960,1295
WEB,Login,count,1547805960,82
WEB,Login,error,1547805960,1
WEB,Emoney,response_time,1547805960,0

A point is only a duplicate if the timestamp and tag values are the same.

In your screenshot, each of the timestamps are unique.

In the sample CSV, while there are duplicate timestamps, the tags (MOBILE/WEB, Payment/Login) are unique across the timestamps.


#5

Hey rawkode, thanks for helping us.

Yeah there are no duplicate points in the source (csv), but when we check the influxdb using this query for example:

SELECT “metric_value” FROM “mandiri”.“RAW_OneDay”.“RAW_APPD_TRX” WHERE time > now() - 1h AND “category”=‘MOBILE’ AND “metric_type”=‘response_time’ AND “trx_name”=‘Payment’

will return these result:

There are duplicate data with random milisecond offset different from the source data. The source is timestamp in second, but why there are some data with millisecond offset?

The first time I realize the problem is when I using sum() aggregate function on ‘count’ or ‘error’ tag, the result is 4-5 times as the expected result


#6

Hi @Richard_Anthony,

Sadly, I am unable to replicate this behaviour.

Config:

[[inputs.logparser]]
  files = ["/etc/telegraf/example.csv"]
  from_beginning = false
  watch_method = "poll"

  [inputs.logparser.grok]
    patterns =["%{GREEDYDATA:category:tag},%{GREEDYDATA:trx_name:tag},%{GREEDYDATA:metric_type:tag},%{NUMBER:timestamp:ts-epoch},%{GREEDYDATA:metric_value:float}"]
    measurement = "RAW_APPD_TRX"
    custom_patterns = '''
    '''
    timezone = "Local"
[[outputs.influxdb]]
  urls = ["http://influxdb:8086"]
  database = "telegraf"
  username = ""
  password = ""
  retention_policy = ""
  write_consistency = "any"
  timeout = "5s"

With the following CSV:

MOBILE,Payment,response_time,1547805960,4624
MOBILE,Payment,count,1547805960,181
MOBILE,Payment,error,1547805960,14
WEB,Payment,response_time,1547805960,67
WEB,Payment,count,1547805960,1
WEB,Payment,error,1547805960,0
WEB,Login,response_time,1547805960,1295
WEB,Login,count,1547805960,82
WEB,Login,error,1547805960,1
WEB,Emoney,response_time,1547805960,0

and adding lines, one by one:

WEB,Emoney,response_time,1547805961,1
WEB,Emoney,response_time,1547805962,2
WEB,Emoney,response_time,1547805963,3
WEB,Emoney,response_time,1547805964,4
WEB,Emoney,response_time,1547805963,3
WEB,Emoney,response_time,1547805965,5

Results:

> SELECT "metric_value" FROM "telegraf"."autogen"."RAW_APPD_TRX" WHERE "category"='WEB' AND "metric_type"='response_time' AND "trx_name"='Emoney'

name: RAW_APPD_TRX
time                metric_value
----                ------------
1547805962000000000 2

> SELECT "metric_value" FROM "telegraf"."autogen"."RAW_APPD_TRX" WHERE "category"='WEB' AND "metric_type"='response_time' AND "trx_name"='Emoney'

name: RAW_APPD_TRX
time                metric_value
----                ------------
1547805962000000000 2
1547805963000000000 3
1547805964000000000 4
1547805965000000000 5

I’ll raise this with one of my colleagues, in-case they’re aware of something I am not.


#7

Thanks for your confirmation, but just curious here, is there any difference between using watch method poll and inotify?


#8

I tried this with inotify too and didn’t experience the duplicates.

These tests were done with low volume data, so I increased the number of rows being appending to the file and I have been successfully able to replicate this (I got one row with a timestamp variance as seen in your data)


#9

please u try log :
WEB,Emoney,response_time,1547805965,1
WEB,Emoney,response_time,1547805965,2
WEB,Emoney,response_time,1547805965,3
WEB,Emoney,response_time_test,1547805965,4
WEB,Emoney,response_time_test,1547805965,3
WEB,Emoney,response_time_test,1547805965,5


#10

I am also facing same issue. Issue is described in the link given below. But not sure where to start with either telegraf or influxdb.
Please find link below where I have opened an issue regarding the same-

Warm Regards,
//Ashlesh


#11

@Richard_Anthony, what versions of influx and telegraf were you using when you experienced this?


#12

The grok parser, and logparser by extension, have a behavior where if two consecutive lines have the same timestamp, they will be adjusted so that they are strictly increasing. So if you input data like all with the same timestamp, like in @alvianno’s example:

WEB,Emoney,response_time,1547805965,1
WEB,Emoney,response_time,1547805965,2
WEB,Emoney,response_time,1547805965,3

The points will be created with timestamps:

1547805965000000000,1
1547805965001000000,2
1547805965002000000,3

#13

@glinton we are using telegraf 1.7.2 and influxdb 1.6.0, both are installed in Windows OS

@daniel but if I’m not mistaken, isn’t the default behaviour if there are multiple data points with same measurement, timestamp, and tag, it will only keep the last value? source: https://docs.influxdata.com/influxdb/v1.7/troubleshooting/frequently-asked-questions/#how-does-influxdb-handle-duplicate-points


#14

It is true that InfluxDB will only have one value per series at a time, but in this case Telegraf is adjusting the timestamp of the input data before it is sent to InfluxDB, preventing the timestamp conflict. The idea behind this was that logfiles often have many lines with the same timestamp, since they are often written a one second resolution, and we wanted to preserve ordering of the lines and have all lines stored in the database without overwrites.

I think we ought to allow this behavior to be disabled, since it isn’t something that will always be helpful. As a workaround, you might want to try using the tail plugin with the data_format = "csv" instead.