How to filter out unchanged values (with difference() == 0) in Flux?

Hi, I’ve made my first query with Flux to extract data that comes from Home Assistant, like so:

from(bucket: "homeassistant")
  |> range(start: 2022-08-10T00:00:00Z)
  |> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
  |> filter(fn: (r) => r["_field"] == "value")
  |> drop(columns: ["_measurement", "_field", "_start",
   "_stop", "domain", "entity_id", "friendly_name"])
  |> filter(fn: (r) => r["_value"] > 0)
  |> group()
  |> sort(columns: ["_time"], desc: false)

This query gives these results:

table _value _time
0	  42	 2022-08-11T02:10:08.111Z
0	  27	 2022-08-12T23:13:04.456Z
0	  27	 2022-08-13T14:15:15.366Z <-- remove this
0	  15	 2022-08-14T00:00:55.586Z
0	  119	 2022-08-14T10:01:26.947Z
0	  122 	 2022-08-15T03:52:06.659Z
0	  122	 2022-08-20T08:21:56.017Z <-- remove this

My _value tag is something that is not really time-related; basically it is a variable that holds how many kilometers a car run with a single charge. So I cannot use a time-window to aggregate those values because it might be an hour, a day or a week…

My goal is keeping only the values that are changed from the last one and so I want to remove those two records marked above. How can I do that in Flux?

I’ve tried with difference() which gaves 0 for the unchanged values, but I cannot understand how to use that function in Flux to filter out the records. I suppose I must add the difference as a new column and then filter that column out?

Welcome @virtualdj

Nice job on the query and clear explanation. What result does this give you?

from(bucket: "homeassistant")
  |> range(start: 2022-08-10T00:00:00Z)
  |> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
  |> filter(fn: (r) => r["_field"] == "value")
  |> drop(columns: ["_measurement", "_field", "_start",
   "_stop", "domain", "entity_id", "friendly_name"])
  |> difference()
  |> filter(fn: (r) => r._value >= 0)
  |> yield(name: "ChangeInKilometers")

Thanks. Your query gives:

table _value _time
0	  0	     2022-08-13T14:15:15.366Z
0	  104	 2022-08-14T10:01:26.947Z
0	  3	     2022-08-15T03:52:06.659Z
0	  0	     2022-08-20T08:21:56.017Z
0	  48	 2022-09-08T03:29:43.150Z
0	  9	     2022-09-11T05:25:16.556Z
0	  0	     2022-09-19T16:10:27.721Z

So it displays the difference (but it excluded the first line 2022-08-11); now how can I correlate this to the original results?

EDIT: It doesn’t seem also to pick up the correct _time.

Let’s try this (I did not check the syntax of the map() function, so an error may pop up). You should see a new column called ChangeInKilometers as well as the original column.

from(bucket: "homeassistant")
  |> range(start: 2022-08-10T00:00:00Z)
  |> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
  |> filter(fn: (r) => r["_field"] == "value")
  |> drop(columns: ["_measurement", "_field", "_start",
   "_stop", "domain", "entity_id", "friendly_name"])
  |> map(fn: (r) => ({r with ChangeInKilometers: (difference(v: r._value))}))
  |> filter(fn: (r) => r["ChangeInKilometers"] >== 0) 
  |> yield(name: "ChangeInKilometers")

It complains of an error on the line:

|> map(fn: (r) => ({r with ChangeInKilometers: (difference(v: r._value))}))

The error is:

error @7:65-7:73: found unexpected argument v
error @7:51-7:74: missing required argument tables

Yep, sorry about that. Still learning all this Flux stuff myself. Try this:

from(bucket: "homeassistant")
  |> range(start: 2022-08-10T00:00:00Z)
  |> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
  |> filter(fn: (r) => r["_field"] == "value")
  |> drop(columns: ["_measurement", "_field", "_start",
   "_stop", "domain", "entity_id", "friendly_name"])
  |> duplicate(column: "_value", as: "OriginalValue")
  |> difference()
  |> rename(columns: {_value: "Difference"})
  |> filter(fn: (r) => r["Difference"] > 0) 
  |> yield(name: "ChangeInKilometers")

I’m away from my home (and Influxdb instance) now, so you should settle for a picture as Influxdb is not very mobile-friendly.
However it doesn’t seem the query gives the correct results… It skips so much data?

The above is what is causing the some results to get filtered out. Delete it and then run again and see how the data looks, then we’ll go from there…

OK, I think this will work…

from(bucket: "homeassistant")
  |> range(start: 2022-08-10T00:00:00Z)
  |> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
  |> filter(fn: (r) => r["_field"] == "value")
  |> drop(columns: ["_measurement", "_field", "_start",
   "_stop", "domain", "entity_id", "friendly_name"])
  |> duplicate(column: "_value", as: "diff")
  |> difference(
    columns: ["diff"],
    keepFirst: true
    )
  |> filter(fn: (r) => not exists r.diff or r.diff != 0)
  |> yield(name: "ChangeInKilometers")

Borrowed mostly from here.

1 Like

Yes! You’ve done it!

Results

So basically I had to duplicate the column and apply difference only to that column and then filter it out. Clear! Thank you very much for your help!

So it is not necessary to sort explicitly?

Glad it’s working.

I do not think the sort() by time function is required, since your data already appears to be sorted by time.

1 Like

Hi @grant1,

I think I have a similar issue and hope that you might be able to help with that too, so I will post it here:

Thank you!