Query for gap start and end time

Hi

Lets say I have data like:

2019-12-11T11:00:00Z	-0.02659667541557301
2019-12-11T11:01:00Z	-0.02659667541557301
2019-12-11T11:05:00Z	-0.026288546600997543
2019-12-11T11:40:00Z	-0.02699662542182227
2019-12-11T12:00:00Z	-0.028346456692913385

I want to have a query which would find all gaps which are at least 5 minutes long, so the result should be e.g:

2019-12-11T11:05:00Z	gap_start
2019-12-11T11:40:00Z	gap_end
2019-12-11T11:40:00Z	gap_start
2019-12-11T12:00:00Z	gap_end

Or even just the start times:

2019-12-11T11:05:00Z	gap_start
2019-12-11T11:40:00Z	gap_start

Any ideas for a nice query for this?

Hint: I would like to avoid group by time(5m), because that would remove some precision.

@madis, are you using Flux or InfluxQL? What do you want returned - only points that end gaps?

Hi scott. I use InfluxQL. I want to receive timestamps that start a gap (where gap=difference at least 5 minutes between two points).

Receiving also end of a gap would be useful, but not necessary, because I will merge it later with another data stream which already has start times of activity (but no end times!).

I’m using influxdb 1.x (not 2.x).

I feel like cheating by asking somebody doing my “homework” for me, but you guys probably have a lot of experience with the query language and performance already :slight_smile:

As far as I know, this isn’t possible using InfluxQL. Even in Flux, it’s a little tricky but totally possible.

If you’re open to using Flux, you can enable and use Flux with InfluxDB 1.7.

Calculating the time between points is simple with the elapsed() function. It appends an elapsed column containing the elapsed time (in the specified unit) between the current point and the previous point. You can then filter rows out with elapsed values great than 5 minutes. This will give you all the gap_end points. To get the gap_start points, you read the same data but in reverse order (by time).

The example below does just that. It creates two sets of data, gapStarts and gapEnds. For each, it uses the elapsed() function to calculate the time between points. It then filters out records with elapsed values that exceed the 5 minute threshold (note that elapsed values in gapStarts are negative because rows are ordered by time in reverse). Each then tags the rows based on which gap end they represent. It then merges the two streams of data together with the union() function and sorts by time again.

baseData = from(bucket: "default")
  |> range(start: -2h)
  |> filter(fn: (r) => r._measurement == "exampleMeasurement" and r._field == "exampleField")

gapStarts = baseData
  |> sort(columns: ["_time"], desc: true)
  |> elapsed(unit: 1m)
  |> filter(fn: (r) => r.elapsed <= -5)
  |> map(fn: (r) => ({ r with gapSide: "gap_start" }))

gapEnds = baseData
  |> elapsed(unit: 1m)
  |> filter(fn: (r) => r.elapsed >= 5)
  |> map(fn: (r) => ({ r with gapSide: "gap_end" }))
  
union(tables: [gapStarts, gapEnds])
  |> keep(columns: ["_time", "gapSide"])
  |> sort(columns: ["_time"])
1 Like

Thank you for the thorough response.

I was hoping difference(time) would work in InfluxQL, but unfortunately difference() works only with normal fields.

I will take a look at Flux and you examples. Thanks again!

Does this sol’n work when the gap occurs at the start of your time range? for example: if I query -7d but the data range returned is -5d will this give the elapsed time for the start of my time range(-7d) to the first record (-5d). If not, how could this be accomplished?

Alternatively, will this pick up on gaps at the end of your time range?

Because the elapsed function takes the difference between time stamps I think your sol’n ignores gaps that come at the start and end of your selected time range, ie if the first/last record comes after/before the start/end of your selected time range. Please correct me if I’m wrong and there is a better way. Here is a solution that checks for missing data that comes before the first record in you table. Could easily be extended to include gaps that come at the end.

import "experimental"

baseData = from(bucket: "xxxxxxxx")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "xxxxxxxx")

mid = baseData
  |> elapsed(columnName: "elapsed", timeColumn: "_time", unit: 1m)
  |> filter(fn: (r) => r.elapsed > 3)
//data is stored every 3m from telegraf, any duration > 3m means missing timestamps
 

end_date = baseData 
    |>group(columns: ["_measurement"], mode: "by")
    |> distinct(column: "_time")
    |> last()
    |> findColumn(fn: (key) => true, column: "_value")
        
start_date = experimental.subDuration(d:-v.timeRangeStart,from: end_date)
 
 
head = baseData    
    |> first(column: "_time")
    |> map(fn: (r) => ({r with startDate: start_date }))
//map below takes difference between start of table and select start time, then converts to minutes from nanoseconds
    |> map(fn: (r) => ({r with Diff: int(v: duration(v: uint(v: r._time) - uint(v:start_date)))/60000000000}))
    |> filter(fn: (r) => r.elapsed > 0)

result = union(tables: [head,mid])