Telegraf time-bucket conflict resolution?

paulo · April 19, 2017, 10:31am

I’m trying to collect web access logs with Telegraf into InfluxDB.

The problem is that the access logs have a granularity «to the second» and often a webserver serves the same request multiple times in the same second. That is, all the “TAG” values (Method, Path, Response Status, etc) are the same. I’ve observed up to ~150 similar requests logged in the same second.

This ends up overwriting the data points in InfluxDB and I lose lots of data with LogStash -> InfluxDB.

Question
Does Telegraf have any mechanism to resolve these time-bucket conflicts?

Example, whenever 2+ data points to be written share a similar set of tags + time (thus end up overwriting one another), add an extra tag like “extra=1”, “extra=2”, etc?

daniel · April 19, 2017, 7:04pm

Can you add an example of the data you are sending to InfluxDB?

paulo · April 20, 2017, 8:35am

Hi Daniel.

As an example, these are some Web Access Logs for one server:

#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 800
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 7
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 9
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 8
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 195
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 2

As you can see, all of these log lines share the same Time + all other Tags (series), which are Layer & Server (inserted externally), Method, URI-Stem and Status. So, they’ll all become the same data point in InfluxDB overriding one another, unless we break it down into different series as recommended in (InfluxDB’s?) documentation that I read somewhere. The only varying value, the last column, is the “time-taken” which is obviously meant to be an InbfluxDB “Value” and is not guaranteed to be unique anyway.

Of course, we don’t want to «tag» every single request with the likes of a GUID, as that would explode the Series, each log line would become a new Series in InfluxDB - we need to only break it down when there’s a conflict.

daniel · April 20, 2017, 5:50pm

How about using an aggregator? Currently we have only one: minmax, but soon we are hoping to add a histogram as well. This would allow you to save the min and max value per interval, but it wouldn’t save every reading.

paulo · April 21, 2017, 8:31am

I thought about using the only existing aggregator. However, if the aggregator allowed to store more extensive metrics (arithmetic Mean and percentiles like the Median + number of different requests) it would still be less than ideal, although possibly aceptable. But, as it is now - only Min and Max values, that’s not good enough, unfortunately.

I need to achieve good precision & accuracy in both Response Times (Mean, Median, 90th~95th percentiles) and Requests per second. This is to conduct detailed Performance Analysis…

Indeed the aggregator plugin might be “almost there”, but not quite there yet. I would need more statistics from the aggregated data point + number of requests that had been aggregated.

Though, hmm!? If each data point stored number of requests aggregated (ex: ,DataPoints=7,) would I still be able to show in Grafana the Requests/s? I’m not entirely sure we’re able to do Sum(DataPoints) of all different Series on a Minute (POST /index.htm, GET /index.htm, etc), to then calculate the average of all collected data points (because each Minute will have collected data points for each Series multiple times).

daniel · April 21, 2017, 8:40pm

I believe with the histogram you could do this, though I’m not very experienced with InfluxQL so I’ll have to double check. The only downside is that the results would be bucketed.

There is also this basicstats pull request in the works.

paulo · April 24, 2017, 8:59am

Hi Daniel,

I believe with the histogram you could do this

Not sure what “histogram” is, nor of all the things I said what “this” is.

There is also this basicstats pull request in the works.

That looks very useful. Basically it looks like an expansion to the existing aggregator (like I was mentioning as an alternative to a conflict-resolving Tag) than a new aggregator (not that that matters much).

Though!
I think I would still prefer the conflict-resolving Tag. Some lower traffic websites will serve the exact same request (running into a time-conflict) only occasionally. Having to use the aggregation every time we upload data in some cases we end up logging more data. Ex: uploading an “aggregation” with Count, Min, Max, Mean, StdDev, Median, 90thPc, 95thPc of 1 single request…

jackzampolin · April 24, 2017, 6:44pm

@paulo A histogram is just a selection of mathematical aggregates, typically mean, max, min, median, p50, p90, p99. So just what you are asking for.

Adding arbitrary tags to data would be an anti-pattern that leads to confusing schemas

Can you increase the precision with which your clients write timestamps?

daniel · April 24, 2017, 8:30pm

Sorry, I thought I had referenced the histogram pull request earlier. This is another aggregator that we hope to add soon, but is not complete.

paulo · April 25, 2017, 3:56pm

@jackzampolin

Are you sure that’s not a proper way of handling time-conflicts? If it is, the documentation should be changed:

https://docs.influxdata.com/influxdb/v1.2/troubleshooting/frequently-asked-questions/#how-does-influxdb-handle-duplicate-points

To store both points:
Introduce an arbitrary new tag to enforce uniqueness.

Anyway, thanks for the answers, both. It appears there’s some confusion atm with aggregator plugins (“basicstats” / “histogram”) but hopefully the “histogram” aggregator will be production ready & tested-under-fire soon.

jackzampolin · April 25, 2017, 4:38pm

@paulo If the tag is applied to every point then that would be a valid way to handle time conflicts. Having a bunch of data in the same measurement with different tag sets is something I, with my DB admin hat on, find confusing. Maybe the docs should be clarified on that point.

paulo · April 25, 2017, 4:47pm

@jackzampolin
Ok, I think I see what you mean.

I’m not so sure anymore we’re thinking different things. I was thinking the likes of a “counter” Tag, where in most situations it would have “1”, only varying when there’s repetitions for that exact same time-bucket.

(or be nonexistent when there’s no conflict but that’s really a “whatever” situation)

Or course, the most time-series-like method is to store periodical aggregations, though considering Influx’s capabilities I’d really like the gained precision of being able to store all the logs. I’m going to end up down the path of aggregating the data, though, I’m pretty sure…

jackzampolin · April 25, 2017, 4:50pm

@paulo Yup! The counter tag would work. I would advise sending it over for every point. I would also agree that letting the database do the aggregates for you would be the most Inlfux-y way to do things.

Topic		Replies	Views
Telegraf Logparser and duplicate data points Store influxdb , telegraf	9	3048	January 5, 2018
Telegraf log parser ---> Influxdb duplicates values Telegraf	14	2489	February 27, 2019
Telegraf cloudwatch input plugin seems to be recording incorrect timestamps in influx Telegraf telegraf	3	1085	April 26, 2018
Select disctinct and count with flux InfluxDB 2 flux	2	719	April 4, 2022
Telegraf Timestamps telegraf	2	1929	March 10, 2019

Telegraf time-bucket conflict resolution?

Related topics