Thinning samples for derivative, when not aligned to bucket window

candlerb · May 26, 2018, 4:28pm

Hi,

I’m trying to understand how best to use derivative, especially with regards to “thinning” data as might happen when zooming out from a time range. Specifically I’m looking at SNMP interface counters, and I’m storing the raw counter values in influxdb.

To illustrate, here is some test data:

interface,agent_host=test ifHCInOctets=5000i 1527346860000000000
interface,agent_host=test ifHCInOctets=5100i 1527346980000000000
interface,agent_host=test ifHCInOctets=6000i 1527347220000000000
interface,agent_host=test ifHCInOctets=6200i 1527347340000000000
interface,agent_host=test ifHCInOctets=7000i 1527347760000000000
interface,agent_host=test ifHCInOctets=7300i 1527347940000000000

This imports as:

> select ifHCInOctets from interface where agent_host='test'
name: interface
time                 ifHCInOctets
----                 ------------
2018-05-26T15:01:00Z 5000
2018-05-26T15:03:00Z 5100
2018-05-26T15:07:00Z 6000
2018-05-26T15:09:00Z 6200
2018-05-26T15:16:00Z 7000
2018-05-26T15:19:00Z 7300

The basic derivative function works just fine:

> select derivative(ifHCInOctets,1s) from interface where agent_host='test'
name: interface
time                 derivative
----                 ----------
2018-05-26T15:03:00Z 0.8333333333333334
2018-05-26T15:07:00Z 3.75
2018-05-26T15:09:00Z 1.6666666666666667
2018-05-26T15:16:00Z 1.9047619047619047
2018-05-26T15:19:00Z 1.6666666666666667

For example: counter goes up from 5000 to 5100 between :01:00 and :03:00; that’s an increase of 100 in 120 seconds, so it’s 0.833. It’s not fazed by the irregular intervals.

Now, I want to thin this out so I’m only looking at one point within each 5 minute period. For example, if I took just the first point in each 5 minute window, I would get:

2018-05-26T15:01:00Z 5000
2018-05-26T15:07:00Z 6000
2018-05-26T15:16:00Z 7000

… and then I could calculate derivatives properly again, just with lower resolution. For example, from 01:00 to 07:00 is 1000 in 6 minutes, so 1000/360 = 2.778; the second one is 1000 in 9 minutes, so 1.852.

Now, reading through the influxdb documentation, it suggests using group by time(5m) and first(). This gives me:

> select first(ifHCInOctets) from interface where agent_host='test' group by time(5m)
name: interface
time                 first
----                 -----
2018-05-26T15:00:00Z 5000
2018-05-26T15:05:00Z 6000
2018-05-26T15:10:00Z
2018-05-26T15:15:00Z 7000
2018-05-26T15:20:00Z

But this has lost the actual timestamp of each data point, instead rounding it to the beginning of the 5 minute window. And hence the calculated rates are wrong:

> select derivative(first(ifHCInOctets),1s) from interface where agent_host='test' and time >= '2018-05-26T15:00:00Z' and time <= '2018-05-26T15:20:00Z' group by time(5m)
name: interface
time                 derivative
----                 ----------
2018-05-26T15:05:00Z 3.3333333333333335
2018-05-26T15:15:00Z 1.6666666666666667

In practice, this might not be too much of a problem as long as (a) the data is being sampled at accurate regular intervals, (b) the sampling interval divides into the aggregation window (which is fine if you sample at 1 minute intervals and aggregate over 5 minutes, but not if you sample at 2 minute intervals and aggregate over 5). And also the sampling is not happening close to the bucket boundary.

However, can anyone suggest a way to get a more accurate answer?

I think I just want to ‘thin’ the sample stream to one sample per bucket - or equivalently, use the timestamp of the row which first() or last() finds, rather than the bucket window boundary.

I have worked with so many systems which don’t accurately convert counts to rates, and it would be great to work with one where I can trust what it shows me

Thanks… Brian.

Topic		Replies	Views
Derivative on downsampled bucket gives wrong result InfluxDB 2 influxdb , influxql	0	644	March 26, 2021
Derivative downsample CQ Store influxdb , datalifecycle , influxql	21	4791	May 17, 2017
Understanding non-negative-derivative calculations	2	2281	June 28, 2019
Influx / Grafana Interface Utilization Spike Spike	2	603	September 17, 2018
Use of a Continuous Query with non_negative_difference and non_negative_derivative functions Store influxdb , time-series	0	2313	February 15, 2018

Thinning samples for derivative, when not aligned to bucket window

Related topics