Find difference over groups of time for a specific tag

bolemo · March 16, 2023, 5:25pm

Please, consider this set of data separated in 3 groups to reflect the 3 different groups of time

_measurement  _time                  _field  _value  tag
test          2021-01-01T00:00:00Z   fld	 1	     a
test          2021-01-01T00:00:00Z   fld	 1 	     b
test          2021-01-01T00:00:00Z   fld	 1 	     c
test          2021-01-01T00:00:00Z   fld 	 1 	     d	    
test          2021-01-01T00:00:00Z   fld	 1 	     e
test          2021-01-01T00:00:00Z   fld	 1	     f

Next time, tag elements c and e are removed ; tag elements g, h and i are added:

_measurement  _time                  _field  _value  tag
test          2021-01-02T00:00:00Z   fld	 1	     a
test          2021-01-02T00:00:00Z   fld 	 1	     b
test          2021-01-02T00:00:00Z   fld 	 1	     d
test          2021-01-02T00:00:00Z   fld 	 1 	     f	    
test          2021-01-02T00:00:00Z   fld 	 1	     g
test          2021-01-02T00:00:00Z   fld	 1	     h
test          2021-01-02T00:00:00Z   fld	 1	     i

Next time, tag elements d, g and h are removed ; tag element j is added.

_measurement  _time                  _field  _value  tag
test          2021-01-03T00:00:00Z   fld	 1	     a
test          2021-01-03T00:00:00Z   fld 	 1	     b
test          2021-01-03T00:00:00Z   fld 	 1 	     f	    
test          2021-01-03T00:00:00Z   fld	 1	     i
test          2021-01-03T00:00:00Z   fld	 1	     j

Now, I would like to find the right query to display the difference in tags between consecutive sets of times like this:

Time                  Tag_elmt_total  Tag_elmt_removed  Tag_elmt_added
2021-01-01T00:00:00Z  6               -                 -
2021-01-02T00:00:00Z  7               2                 3
2021-01-03T00:00:00Z  5               3                 1

I figured out how to count the total number of “tag” elements for each time, but I am not sure how to calculate the number of removed and added tag elements between each groups of time to produce a table like the one above…

from(bucket: "myBucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "test" and r._field == "fld")
  |> group(columns:["_time"])
  |> sum()
  |> group()

Thank you for any insight…

Anaisdg · March 20, 2023, 3:49pm

Hello @bolemo,

You’d have to use timeShift to reposition the rows.
timeShift() function | Flux 0.x Documentation
Then you join on time so you can compare
Joins in Flux | Additional resources | InfluxData Documentation

join.time(left: data_from_02, right: shifted_data_from_02_that_now_has_01, as: (l, r) => ({l with _value_02: r._value}))

Then you’d use conditional filtering to compare values from 01 and 02 to see if they’re the same or new
Query using conditional logic in Flux | InfluxDB Cloud Documentation
Then you’d sum

I hope that helps!

bolemo · March 20, 2023, 4:17pm

Sounds promising.
The timeShift() might not be exactly the function I need, as the interval between two sets of measurements might not be a constant duration (I need to compare between a specific timestamp from the previous one (whatever the duration between them might be).
It seems to be however a tool I could definitely use with an interval slightly smaller than the smallest one I would expect, and discard all consecutive duplicates tables or something like that.

This is already helping considerably. I will experiment with this and come back here with my results.

Thank you for your help

Anaisdg · March 21, 2023, 6:08pm

@bolemo Thank you!! Yah let me know how it goes

bolemo · March 24, 2023, 5:53pm

Well, I found a way, based on your suggestion of timeShift(), but it was tricky…
Probably not the most efficient code, but it works . I might use a task to calculate periodically this on a separate bucket/measurement.

Here it is:

import "join"
import "interpolate"

data = () => from(bucket: "myBucket")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r._measurement == "test" and r._field == "fld")
  |> map(fn: (r) => ({ r with _value: 1.0 })) // float for interpolate.linear()
// Align all time values to round intervals and keep last measurements of each day —>
  |> window(every: 1d)
  |> last()
  |> drop(columns: ["_time"])
  |> duplicate(column: "_stop", as: "_time") // need to replace _stop by _time for interpolate.linear()
  |> window(every: inf)
// <—
  |> interpolate.linear(every: 1d) // needed to fill empty days.
  |> group(columns:["tag"])

shifted = () => data()
  |> timeShift(duration: 1d, columns: ["_time"])

t = join.full(left: data(), right: shifted(), on: (l, r) => l._time == r._time, as: (l, r) => {
    time = if exists l._time then l._time else r._time
    tag = if exists l.tag then l.tag else r.tag
    value = l._value
    added = if exists r._value then 0 else 1
    removed = if exists r._value and not exists l._value then 1 else 0
    return {_time: time, _value: value, added: added, removed: removed, tag: tag}
  })

And then t can be used to display number of elements, number of added elements or number of removed elements:

t
  |> group(columns:["_time"])
  |> sum(column: "_value")
  |> group()
  |> yield(name: "nb_elements")

t
  |> group(columns:["_time"])
  |> sum(column: "added")
  |> group()
  |> yield(name: "nb_added")

t
  |> group(columns:["_time"])
  |> sum(column: "removed")
  |> group()
  |> yield(name: "nb_removed")

Topic		Replies	Views
Counting number of groups Fluxlang influxdb , flux	3	7379	July 9, 2020
Flux calculate time between records Fluxlang	4	999	February 1, 2023
Select Last From Different Tag Grafana influxdb , query	1	595	July 5, 2022
Unique tag values in time interval InfluxDB 2	4	903	February 25, 2020
Compare consecutive values of the same field Fluxlang influxdb , time-series , query , flux	0	555	October 25, 2021

Find difference over groups of time for a specific tag

Related topics