Query for gap start and end time

madis · December 11, 2019, 12:38pm

Hi

Lets say I have data like:

2019-12-11T11:00:00Z	-0.02659667541557301
2019-12-11T11:01:00Z	-0.02659667541557301
2019-12-11T11:05:00Z	-0.026288546600997543
2019-12-11T11:40:00Z	-0.02699662542182227
2019-12-11T12:00:00Z	-0.028346456692913385

I want to have a query which would find all gaps which are at least 5 minutes long, so the result should be e.g:

2019-12-11T11:05:00Z	gap_start
2019-12-11T11:40:00Z	gap_end
2019-12-11T11:40:00Z	gap_start
2019-12-11T12:00:00Z	gap_end

Or even just the start times:

2019-12-11T11:05:00Z	gap_start
2019-12-11T11:40:00Z	gap_start

Any ideas for a nice query for this?

Hint: I would like to avoid group by time(5m), because that would remove some precision.

scott · December 11, 2019, 3:26pm

@madis, are you using Flux or InfluxQL? What do you want returned - only points that end gaps?

madis · December 11, 2019, 3:45pm

Hi scott. I use InfluxQL. I want to receive timestamps that start a gap (where gap=difference at least 5 minutes between two points).

Receiving also end of a gap would be useful, but not necessary, because I will merge it later with another data stream which already has start times of activity (but no end times!).

I’m using influxdb 1.x (not 2.x).

I feel like cheating by asking somebody doing my “homework” for me, but you guys probably have a lot of experience with the query language and performance already

scott · December 12, 2019, 6:00pm

As far as I know, this isn’t possible using InfluxQL. Even in Flux, it’s a little tricky but totally possible.

If you’re open to using Flux, you can enable and use Flux with InfluxDB 1.7.

Calculating the time between points is simple with the elapsed() function. It appends an elapsed column containing the elapsed time (in the specified unit) between the current point and the previous point. You can then filter rows out with elapsed values great than 5 minutes. This will give you all the gap_end points. To get the gap_start points, you read the same data but in reverse order (by time).

The example below does just that. It creates two sets of data, gapStarts and gapEnds. For each, it uses the elapsed() function to calculate the time between points. It then filters out records with elapsed values that exceed the 5 minute threshold (note that elapsed values in gapStarts are negative because rows are ordered by time in reverse). Each then tags the rows based on which gap end they represent. It then merges the two streams of data together with the union() function and sorts by time again.

baseData = from(bucket: "default")
  |> range(start: -2h)
  |> filter(fn: (r) => r._measurement == "exampleMeasurement" and r._field == "exampleField")

gapStarts = baseData
  |> sort(columns: ["_time"], desc: true)
  |> elapsed(unit: 1m)
  |> filter(fn: (r) => r.elapsed <= -5)
  |> map(fn: (r) => ({ r with gapSide: "gap_start" }))

gapEnds = baseData
  |> elapsed(unit: 1m)
  |> filter(fn: (r) => r.elapsed >= 5)
  |> map(fn: (r) => ({ r with gapSide: "gap_end" }))
  
union(tables: [gapStarts, gapEnds])
  |> keep(columns: ["_time", "gapSide"])
  |> sort(columns: ["_time"])

madis · December 13, 2019, 9:34am

Thank you for the thorough response.

I was hoping difference(time) would work in InfluxQL, but unfortunately difference() works only with normal fields.

I will take a look at Flux and you examples. Thanks again!

blharvey · June 3, 2022, 6:54pm

Does this sol’n work when the gap occurs at the start of your time range? for example: if I query -7d but the data range returned is -5d will this give the elapsed time for the start of my time range(-7d) to the first record (-5d). If not, how could this be accomplished?

Alternatively, will this pick up on gaps at the end of your time range?

blharvey · June 3, 2022, 11:38pm

Because the elapsed function takes the difference between time stamps I think your sol’n ignores gaps that come at the start and end of your selected time range, ie if the first/last record comes after/before the start/end of your selected time range. Please correct me if I’m wrong and there is a better way. Here is a solution that checks for missing data that comes before the first record in you table. Could easily be extended to include gaps that come at the end.

import "experimental"

baseData = from(bucket: "xxxxxxxx")
  |> range(start: v.timeRangeStart)
  |> filter(fn: (r) => r._measurement == "xxxxxxxx")

mid = baseData
  |> elapsed(columnName: "elapsed", timeColumn: "_time", unit: 1m)
  |> filter(fn: (r) => r.elapsed > 3)
//data is stored every 3m from telegraf, any duration > 3m means missing timestamps
 

end_date = baseData 
    |>group(columns: ["_measurement"], mode: "by")
    |> distinct(column: "_time")
    |> last()
    |> findColumn(fn: (key) => true, column: "_value")
        
start_date = experimental.subDuration(d:-v.timeRangeStart,from: end_date)
 
 
head = baseData    
    |> first(column: "_time")
    |> map(fn: (r) => ({r with startDate: start_date }))
//map below takes difference between start of table and select start time, then converts to minutes from nanoseconds
    |> map(fn: (r) => ({r with Diff: int(v: duration(v: uint(v: r._time) - uint(v:start_date)))/60000000000}))
    |> filter(fn: (r) => r.elapsed > 0)

result = union(tables: [head,mid])

Topic		Replies	Views
[Solved]How to find gaps in my data? influxdb , time	3	4995	June 3, 2022
How to group by elapsed time (gap detection) Fluxlang	1	621	October 22, 2020
Find gap between _time InfluxDB 2 influxdb , query , flux	3	781	April 8, 2022
How to merge time interval start and end time into one point InfluxDB 2 influxdb , flux	3	1056	August 24, 2021
[InfluxQL] Selecting value and time since last value Dashboards	2	1661	August 16, 2020

Query for gap start and end time

Related topics