Please, consider this set of data separated in 3 groups to reflect the 3 different groups of time
_measurement _time _field _value tag
test 2021-01-01T00:00:00Z fld 1 a
test 2021-01-01T00:00:00Z fld 1 b
test 2021-01-01T00:00:00Z fld 1 c
test 2021-01-01T00:00:00Z fld 1 d
test 2021-01-01T00:00:00Z fld 1 e
test 2021-01-01T00:00:00Z fld 1 f
Next time, tag elements c and e are removed ; tag elements g, h and i are added:
_measurement _time _field _value tag
test 2021-01-02T00:00:00Z fld 1 a
test 2021-01-02T00:00:00Z fld 1 b
test 2021-01-02T00:00:00Z fld 1 d
test 2021-01-02T00:00:00Z fld 1 f
test 2021-01-02T00:00:00Z fld 1 g
test 2021-01-02T00:00:00Z fld 1 h
test 2021-01-02T00:00:00Z fld 1 i
Next time, tag elements d, g and h are removed ; tag element j is added.
_measurement _time _field _value tag
test 2021-01-03T00:00:00Z fld 1 a
test 2021-01-03T00:00:00Z fld 1 b
test 2021-01-03T00:00:00Z fld 1 f
test 2021-01-03T00:00:00Z fld 1 i
test 2021-01-03T00:00:00Z fld 1 j
Now, I would like to find the right query to display the difference in tags between consecutive sets of times like this:
I figured out how to count the total number of “tag” elements for each time, but I am not sure how to calculate the number of removed and added tag elements between each groups of time to produce a table like the one above…
Sounds promising.
The timeShift() might not be exactly the function I need, as the interval between two sets of measurements might not be a constant duration (I need to compare between a specific timestamp from the previous one (whatever the duration between them might be).
It seems to be however a tool I could definitely use with an interval slightly smaller than the smallest one I would expect, and discard all consecutive duplicates tables or something like that.
This is already helping considerably. I will experiment with this and come back here with my results.
Well, I found a way, based on your suggestion of timeShift(), but it was tricky…
Probably not the most efficient code, but it works . I might use a task to calculate periodically this on a separate bucket/measurement.
Here it is:
import "join"
import "interpolate"
data = () => from(bucket: "myBucket")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r._measurement == "test" and r._field == "fld")
|> map(fn: (r) => ({ r with _value: 1.0 })) // float for interpolate.linear()
// Align all time values to round intervals and keep last measurements of each day —>
|> window(every: 1d)
|> last()
|> drop(columns: ["_time"])
|> duplicate(column: "_stop", as: "_time") // need to replace _stop by _time for interpolate.linear()
|> window(every: inf)
// <—
|> interpolate.linear(every: 1d) // needed to fill empty days.
|> group(columns:["tag"])
shifted = () => data()
|> timeShift(duration: 1d, columns: ["_time"])
t = join.full(left: data(), right: shifted(), on: (l, r) => l._time == r._time, as: (l, r) => {
time = if exists l._time then l._time else r._time
tag = if exists l.tag then l.tag else r.tag
value = l._value
added = if exists r._value then 0 else 1
removed = if exists r._value and not exists l._value then 1 else 0
return {_time: time, _value: value, added: added, removed: removed, tag: tag}
})
And then t can be used to display number of elements, number of added elements or number of removed elements: