I’m trying to collect web access logs with Telegraf into InfluxDB.
The problem is that the access logs have a granularity «to the second» and often a webserver serves the same request multiple times in the same second. That is, all the “TAG” values (Method, Path, Response Status, etc) are the same. I’ve observed up to ~150 similar requests logged in the same second.
This ends up overwriting the data points in InfluxDB and I lose lots of data with LogStash -> InfluxDB.
Does Telegraf have any mechanism to resolve these time-bucket conflicts?
Example, whenever 2+ data points to be written share a similar set of tags + time (thus end up overwriting one another), add an extra tag like “extra=1”, “extra=2”, etc?
Can you add an example of the data you are sending to InfluxDB?
As an example, these are some Web Access Logs for one server:
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) sc-status sc-substatus sc-win32-status time-taken
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 800
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 7
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 9
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 8
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 195
2017-04-11 16:43:11 10.129.37.36 POST /PaymentGatewayService.svc - 85 - 10.129.37.4 - 200 0 0 2
As you can see, all of these log lines share the same Time + all other Tags (series), which are Layer & Server (inserted externally), Method, URI-Stem and Status. So, they’ll all become the same data point in InfluxDB overriding one another, unless we break it down into different series as recommended in (InfluxDB’s?) documentation that I read somewhere. The only varying value, the last column, is the “time-taken” which is obviously meant to be an InbfluxDB “Value” and is not guaranteed to be unique anyway.
Of course, we don’t want to «tag» every single request with the likes of a GUID, as that would explode the Series, each log line would become a new Series in InfluxDB - we need to only break it down when there’s a conflict.
How about using an aggregator? Currently we have only one: minmax, but soon we are hoping to add a histogram as well. This would allow you to save the min and max value per interval, but it wouldn’t save every reading.
I thought about using the only existing aggregator. However, if the aggregator allowed to store more extensive metrics (arithmetic Mean and percentiles like the Median + number of different requests) it would still be less than ideal, although possibly aceptable. But, as it is now - only Min and Max values, that’s not good enough, unfortunately.
I need to achieve good precision & accuracy in both Response Times (Mean, Median, 90th~95th percentiles) and Requests per second. This is to conduct detailed Performance Analysis…
Indeed the aggregator plugin might be “almost there”, but not quite there yet. I would need more statistics from the aggregated data point + number of requests that had been aggregated.
Though, hmm!? If each data point stored number of requests aggregated (ex: ,DataPoints=7,) would I still be able to show in Grafana the Requests/s? I’m not entirely sure we’re able to do Sum(DataPoints) of all different Series on a Minute (POST /index.htm, GET /index.htm, etc), to then calculate the average of all collected data points (because each Minute will have collected data points for each Series multiple times).
I believe with the histogram you could do this, though I’m not very experienced with InfluxQL so I’ll have to double check. The only downside is that the results would be bucketed.
There is also this basicstats pull request in the works.
I believe with the histogram you could do this
Not sure what “histogram” is, nor of all the things I said what “this” is.
There is also this basicstats pull request in the works.
That looks very useful. Basically it looks like an expansion to the existing aggregator (like I was mentioning as an alternative to a conflict-resolving Tag) than a new aggregator (not that that matters much).
I think I would still prefer the conflict-resolving Tag. Some lower traffic websites will serve the exact same request (running into a time-conflict) only occasionally. Having to use the aggregation every time we upload data in some cases we end up logging more data. Ex: uploading an “aggregation” with Count, Min, Max, Mean, StdDev, Median, 90thPc, 95thPc of 1 single request…
@paulo A histogram is just a selection of mathematical aggregates, typically
mean, max, min, median, p50, p90, p99. So just what you are asking for.
Adding arbitrary tags to data would be an anti-pattern that leads to confusing schemas
Can you increase the precision with which your clients write timestamps?
Sorry, I thought I had referenced the histogram pull request earlier. This is another aggregator that we hope to add soon, but is not complete.
Are you sure that’s not a proper way of handling time-conflicts? If it is, the documentation should be changed:
To store both points:
Introduce an arbitrary new tag to enforce uniqueness.
Anyway, thanks for the answers, both. It appears there’s some confusion atm with aggregator plugins (“basicstats” / “histogram”) but hopefully the “histogram” aggregator will be production ready & tested-under-fire soon.
@paulo If the tag is applied to every point then that would be a valid way to handle time conflicts. Having a bunch of data in the same measurement with different tag sets is something I, with my DB admin hat on, find confusing. Maybe the docs should be clarified on that point.
Ok, I think I see what you mean.
I’m not so sure anymore we’re thinking different things. I was thinking the likes of a “counter” Tag, where in most situations it would have “1”, only varying when there’s repetitions for that exact same time-bucket.
(or be nonexistent when there’s no conflict but that’s really a “whatever” situation)
Or course, the most time-series-like method is to store periodical aggregations, though considering Influx’s capabilities I’d really like the gained precision of being able to store all the logs. I’m going to end up down the path of aggregating the data, though, I’m pretty sure…
@paulo Yup! The
counter tag would work. I would advise sending it over for every point. I would also agree that letting the database do the aggregates for you would be the most Inlfux-y way to do things.