Hi scott. I use InfluxQL. I want to receive timestamps that start a gap (where gap=difference at least 5 minutes between two points).
Receiving also end of a gap would be useful, but not necessary, because I will merge it later with another data stream which already has start times of activity (but no end times!).
I’m using influxdb 1.x (not 2.x).
I feel like cheating by asking somebody doing my “homework” for me, but you guys probably have a lot of experience with the query language and performance already
Calculating the time between points is simple with the elapsed() function. It appends an elapsed column containing the elapsed time (in the specified unit) between the current point and the previous point. You can then filter rows out with elapsed values great than 5 minutes. This will give you all the gap_end points. To get the gap_start points, you read the same data but in reverse order (by time).
The example below does just that. It creates two sets of data, gapStarts and gapEnds. For each, it uses the elapsed() function to calculate the time between points. It then filters out records with elapsed values that exceed the 5 minute threshold (note that elapsed values in gapStarts are negative because rows are ordered by time in reverse). Each then tags the rows based on which gap end they represent. It then merges the two streams of data together with the union() function and sorts by time again.
Does this sol’n work when the gap occurs at the start of your time range? for example: if I query -7d but the data range returned is -5d will this give the elapsed time for the start of my time range(-7d) to the first record (-5d). If not, how could this be accomplished?
Alternatively, will this pick up on gaps at the end of your time range?
Because the elapsed function takes the difference between time stamps I think your sol’n ignores gaps that come at the start and end of your selected time range, ie if the first/last record comes after/before the start/end of your selected time range. Please correct me if I’m wrong and there is a better way. Here is a solution that checks for missing data that comes before the first record in you table. Could easily be extended to include gaps that come at the end.
import "experimental"
baseData = from(bucket: "xxxxxxxx")
|> range(start: v.timeRangeStart)
|> filter(fn: (r) => r._measurement == "xxxxxxxx")
mid = baseData
|> elapsed(columnName: "elapsed", timeColumn: "_time", unit: 1m)
|> filter(fn: (r) => r.elapsed > 3)
//data is stored every 3m from telegraf, any duration > 3m means missing timestamps
end_date = baseData
|>group(columns: ["_measurement"], mode: "by")
|> distinct(column: "_time")
|> last()
|> findColumn(fn: (key) => true, column: "_value")
start_date = experimental.subDuration(d:-v.timeRangeStart,from: end_date)
head = baseData
|> first(column: "_time")
|> map(fn: (r) => ({r with startDate: start_date }))
//map below takes difference between start of table and select start time, then converts to minutes from nanoseconds
|> map(fn: (r) => ({r with Diff: int(v: duration(v: uint(v: r._time) - uint(v:start_date)))/60000000000}))
|> filter(fn: (r) => r.elapsed > 0)
result = union(tables: [head,mid])