Telegraf Logparser and duplicate data points

influxdb
telegraf
#1

Hi.
We are using the logparser to parse nginx webserver access logs, which have a precision of seconds. We produce tags like “domain” and “pathlevel1” (first part of the URL after domain) to get some metrics that are included in every logentry.
As there are hundreds of requests per second it is likely that the logentry has no unique tagset with the same timestamp.
I read:


As I understand the docs, if telegraf emits multiple lines to InfluxDB with the same tagset and timestamp then all are overwritten, except the last one ?
If so, is the only way to “survive” all the metrics to add some kind of unique tags ?
Is there another elegant way to accomplish this with telegraf or do I need to find some weird strings (like SessionIDs) in the logs to make every entry with the same timestamp have a unique tagset ?

#2

Logparser will try to add a time offset to mitigate this problem:

$ cat ts-conflict.log
2017-02-21T13:10:34Z value=42
2017-02-21T13:10:34Z value=43
2017-02-21T13:10:34Z value=44
$ telegraf
logparser_grok,host=loaner,path=ts-conflict.log value=42 1487682634000000000
logparser_grok,host=loaner,path=ts-conflict.log value=43 1487682634001000000
logparser_grok,host=loaner,path=ts-conflict.log value=44 1487682634002000000

You could also use an aggregator plugin, in 1.5.0 we will add a new basicstats plugin (release candidates are available now):

[[aggregators.basicstats]]
  period = "10s"
  drop_original = true
  namepass = ["logparser_grok"]
logparser_grok,host=loaner,path=ts-conflict.log value_count=3,value_max=44,value_mean=43,value_min=42,value_s2=1,value_stdev=1 1513025916000000000

One thing to be aware of with the aggregator is that it will not work with historical timestamps, so I had to cheat to make it work with the above timestamps, but it should be fine with new log files.

do I need to find some weird strings (like SessionIDs) in the logs to make every entry with the same timestamp have a unique tagset

I wouldn’t recommend this because a side effect will be run away series cardinality, which will cause bad performance in the database.

#3

Hey Daniel,
this is very helpful and my “potential” problem is already solved with the offset (which I didn’t realize).
So the logparser is already doing what I was thinking would be the best solution to this problem.
I will also have a look at the new aggregator plugin, which seems to be useful for us as well.
Thanks for your great work on the Influx stack.

#4

One more question on the basicstats aggregator…
Is it doing the aggregation only for logparser lines with the same tagset, because one logparser/measurement may produce different tagsets ?
Can it also handle multiple values in one datapoint/log entry ?
Are zero values or negative values a problem ?

#5

I discovered another issue, where I am not sure how logparser/telegraf handles it. We have a central nginx instance that receives logdata from several nginx nodes. So the log entries are not written chronologically, but in order of arrival. So I guess logparser can only add offsets for chronological data or data it still has in memory ?
If the data was flushed to InfluxDB and in the next interval there are still entries for datapoints (with same tagset) that were already flushed then points get overwritten in InfluxDB.
Can you please explain, how the offset mechanism is supposed to work. Can I solve this problem with increasing the flush interval or do I need to sort the logfile chronologically ?

#6

ok, meanwhile I made some tests and found out that the logparser increases the ms offset per line per measurement, but resets it, if a new timestamp occurs. Tagset doesn’t matter.
So in a chronological logfile the offset works great, because every measurement’s logentries have a unique ms timestamp.

In our usecase, where data comes in unchronologically the offset gets reset a lot of times with the same second arriving multiple times again later in between other timestamps/seconds. So we get the same lower milliseconds (1-10) a lot of times with same measurement and sometimes same tagset, what causes data to be overwritten in the DB.

So I guess for historical data it would help to sort the logfile chronologically before ingesting it.
For realtime input into logparser I will try to start using $msec in nginx log configuration. Maybe I also find a way to sort the data as well and write them out to another file. Any hints for a good solution to this issue?

#7

Maybe you can have the log aggregation system process each nodes logs separately? You would want to add a tag for the nginx node that handled the request, this way they would all be unique and also in ascending order of time.

#8

One more fact I discovered is that if I have a fileglob that matches multiple files, then telegraf parses the files in parallel and the offset mechanism doesn’t work anymore, even if the files are sorted chronologically. Internally it ingests the different files with different timestamps “in one buffer” and always resets the offset, so most of my data gets lost, because I mostly only get 1 datapoint per second per tagset. Could this be a bug ?
I would find it valuable, if there would be some documentation for logparser (or InfluxDB) that states that I can only parse one file at a time that has chronological timestamps, otherwise data in InfluxDB likely gets overwritten. It might help to use milli(or nano)seconds resolution, if the logging application supports that.

#9

I would agree with that being a bug and also that the missing documentation is a bug. Would you be able to open two new issues on the Telegraf github for these problems?

#10

sorry, I forgot one important thing, I excluded the “path” tag in the logparsers, because I don’t need it in the DB, when I read only the single “realtime” logfile. I then used the same config to ingest multiple historical files in parallel, what lead to the described effect. So I cannot exclude the path tag an read multiple files, that is clear now. :wink: So this is not a bug, just missing experience with the tools.