Query for gap start and end time

Hi

Lets say I have data like:

2019-12-11T11:00:00Z	-0.02659667541557301
2019-12-11T11:01:00Z	-0.02659667541557301
2019-12-11T11:05:00Z	-0.026288546600997543
2019-12-11T11:40:00Z	-0.02699662542182227
2019-12-11T12:00:00Z	-0.028346456692913385

I want to have a query which would find all gaps which are at least 5 minutes long, so the result should be e.g:

2019-12-11T11:05:00Z	gap_start
2019-12-11T11:40:00Z	gap_end
2019-12-11T11:40:00Z	gap_start
2019-12-11T12:00:00Z	gap_end

Or even just the start times:

2019-12-11T11:05:00Z	gap_start
2019-12-11T11:40:00Z	gap_start

Any ideas for a nice query for this?

Hint: I would like to avoid group by time(5m), because that would remove some precision.

@madis, are you using Flux or InfluxQL? What do you want returned - only points that end gaps?

Hi scott. I use InfluxQL. I want to receive timestamps that start a gap (where gap=difference at least 5 minutes between two points).

Receiving also end of a gap would be useful, but not necessary, because I will merge it later with another data stream which already has start times of activity (but no end times!).

I’m using influxdb 1.x (not 2.x).

I feel like cheating by asking somebody doing my “homework” for me, but you guys probably have a lot of experience with the query language and performance already :slight_smile:

As far as I know, this isn’t possible using InfluxQL. Even in Flux, it’s a little tricky but totally possible.

If you’re open to using Flux, you can enable and use Flux with InfluxDB 1.7.

Calculating the time between points is simple with the elapsed() function. It appends an elapsed column containing the elapsed time (in the specified unit) between the current point and the previous point. You can then filter rows out with elapsed values great than 5 minutes. This will give you all the gap_end points. To get the gap_start points, you read the same data but in reverse order (by time).

The example below does just that. It creates two sets of data, gapStarts and gapEnds. For each, it uses the elapsed() function to calculate the time between points. It then filters out records with elapsed values that exceed the 5 minute threshold (note that elapsed values in gapStarts are negative because rows are ordered by time in reverse). Each then tags the rows based on which gap end they represent. It then merges the two streams of data together with the union() function and sorts by time again.

baseData = from(bucket: "default")
  |> range(start: -2h)
  |> filter(fn: (r) => r._measurement == "exampleMeasurement" and r._field == "exampleField")

gapStarts = baseData
  |> sort(columns: ["_time"], desc: true)
  |> elapsed(unit: 1m)
  |> filter(fn: (r) => r.elapsed <= -5)
  |> map(fn: (r) => ({ r with gapSide: "gap_start" }))

gapEnds = baseData
  |> elapsed(unit: 1m)
  |> filter(fn: (r) => r.elapsed >= 5)
  |> map(fn: (r) => ({ r with gapSide: "gap_end" }))
  
union(tables: [gapStarts, gapEnds])
  |> keep(columns: ["_time", "gapSide"])
  |> sort(columns: ["_time"])

Thank you for the thorough response.

I was hoping difference(time) would work in InfluxQL, but unfortunately difference() works only with normal fields.

I will take a look at Flux and you examples. Thanks again!