Find difference over groups of time for a specific tag

Please, consider this set of data separated in 3 groups to reflect the 3 different groups of time

_measurement  _time                  _field  _value  tag
test          2021-01-01T00:00:00Z   fld	 1	     a
test          2021-01-01T00:00:00Z   fld	 1 	     b
test          2021-01-01T00:00:00Z   fld	 1 	     c
test          2021-01-01T00:00:00Z   fld 	 1 	     d	    
test          2021-01-01T00:00:00Z   fld	 1 	     e
test          2021-01-01T00:00:00Z   fld	 1	     f

Next time, tag elements c and e are removed ; tag elements g, h and i are added:

_measurement  _time                  _field  _value  tag
test          2021-01-02T00:00:00Z   fld	 1	     a
test          2021-01-02T00:00:00Z   fld 	 1	     b
test          2021-01-02T00:00:00Z   fld 	 1	     d
test          2021-01-02T00:00:00Z   fld 	 1 	     f	    
test          2021-01-02T00:00:00Z   fld 	 1	     g
test          2021-01-02T00:00:00Z   fld	 1	     h
test          2021-01-02T00:00:00Z   fld	 1	     i

Next time, tag elements d, g and h are removed ; tag element j is added.

_measurement  _time                  _field  _value  tag
test          2021-01-03T00:00:00Z   fld	 1	     a
test          2021-01-03T00:00:00Z   fld 	 1	     b
test          2021-01-03T00:00:00Z   fld 	 1 	     f	    
test          2021-01-03T00:00:00Z   fld	 1	     i
test          2021-01-03T00:00:00Z   fld	 1	     j

Now, I would like to find the right query to display the difference in tags between consecutive sets of times like this:

Time                  Tag_elmt_total  Tag_elmt_removed  Tag_elmt_added
2021-01-01T00:00:00Z  6               -                 -
2021-01-02T00:00:00Z  7               2                 3
2021-01-03T00:00:00Z  5               3                 1

I figured out how to count the total number of “tag” elements for each time, but I am not sure how to calculate the number of removed and added tag elements between each groups of time to produce a table like the one above…

from(bucket: "myBucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "test" and r._field == "fld")
  |> group(columns:["_time"])
  |> sum()
  |> group()

Thank you for any insight…

Hello @bolemo,

  1. You’d have to use timeShift to reposition the rows.
    timeShift() function | Flux 0.x Documentation
  2. Then you join on time so you can compare
    Joins in Flux | Additional resources | InfluxData Documentation
join.time(left: data_from_02, right: shifted_data_from_02_that_now_has_01, as: (l, r) => ({l with _value_02: r._value}))
  1. Then you’d use conditional filtering to compare values from 01 and 02 to see if they’re the same or new
    Query using conditional logic in Flux | InfluxDB Cloud Documentation
  2. Then you’d sum

I hope that helps!

Sounds promising.
The timeShift() might not be exactly the function I need, as the interval between two sets of measurements might not be a constant duration (I need to compare between a specific timestamp from the previous one (whatever the duration between them might be).
It seems to be however a tool I could definitely use with an interval slightly smaller than the smallest one I would expect, and discard all consecutive duplicates tables or something like that.

This is already helping considerably. I will experiment with this and come back here with my results.

Thank you for your help :slightly_smiling_face:

@bolemo Thank you!! Yah let me know how it goes :slight_smile:

Well, I found a way, based on your suggestion of timeShift(), but it was tricky…
Probably not the most efficient code, but it works :slightly_smiling_face:. I might use a task to calculate periodically this on a separate bucket/measurement.

Here it is:

import "join"
import "interpolate"

data = () => from(bucket: "myBucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "test" and r._field == "fld")
  |> map(fn: (r) => ({ r with _value: 1.0 })) // float for interpolate.linear()
// Align all time values to round intervals and keep last measurements of each day —>
  |> window(every: 1d)
  |> last()
  |> drop(columns: ["_time"])
  |> duplicate(column: "_stop", as: "_time") // need to replace _stop by _time for interpolate.linear()
  |> window(every: inf)
// <—
  |> interpolate.linear(every: 1d) // needed to fill empty days.
  |> group(columns:["tag"])

shifted = () => data()
  |> timeShift(duration: 1d, columns: ["_time"])

t = join.full(left: data(), right: shifted(), on: (l, r) => l._time == r._time, as: (l, r) => {
    time = if exists l._time then l._time else r._time
    tag = if exists l.tag then l.tag else r.tag
    value = l._value
    added = if exists r._value then 0 else 1
    removed = if exists r._value and not exists l._value then 1 else 0
    return {_time: time, _value: value, added: added, removed: removed, tag: tag}

And then t can be used to display number of elements, number of added elements or number of removed elements:

  |> group(columns:["_time"])
  |> sum(column: "_value")
  |> group()
  |> yield(name: "nb_elements")

  |> group(columns:["_time"])
  |> sum(column: "added")
  |> group()
  |> yield(name: "nb_added")

  |> group(columns:["_time"])
  |> sum(column: "removed")
  |> group()
  |> yield(name: "nb_removed")