Hi, I’ve made my first query with Flux to extract data that comes from Home Assistant, like so:
from(bucket: "homeassistant")
|> range(start: 2022-08-10T00:00:00Z)
|> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
|> filter(fn: (r) => r["_field"] == "value")
|> drop(columns: ["_measurement", "_field", "_start",
"_stop", "domain", "entity_id", "friendly_name"])
|> filter(fn: (r) => r["_value"] > 0)
|> group()
|> sort(columns: ["_time"], desc: false)
This query gives these results:
table _value _time
0 42 2022-08-11T02:10:08.111Z
0 27 2022-08-12T23:13:04.456Z
0 27 2022-08-13T14:15:15.366Z <-- remove this
0 15 2022-08-14T00:00:55.586Z
0 119 2022-08-14T10:01:26.947Z
0 122 2022-08-15T03:52:06.659Z
0 122 2022-08-20T08:21:56.017Z <-- remove this
My _value
tag is something that is not really time-related; basically it is a variable that holds how many kilometers a car run with a single charge. So I cannot use a time-window to aggregate those values because it might be an hour, a day or a week…
My goal is keeping only the values that are changed from the last one and so I want to remove those two records marked above. How can I do that in Flux?
I’ve tried with difference()
which gaves 0 for the unchanged values, but I cannot understand how to use that function in Flux to filter out the records. I suppose I must add the difference as a new column and then filter that column out?
grant1
January 28, 2023, 1:39pm
2
Welcome @virtualdj
Nice job on the query and clear explanation. What result does this give you?
from(bucket: "homeassistant")
|> range(start: 2022-08-10T00:00:00Z)
|> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
|> filter(fn: (r) => r["_field"] == "value")
|> drop(columns: ["_measurement", "_field", "_start",
"_stop", "domain", "entity_id", "friendly_name"])
|> difference()
|> filter(fn: (r) => r._value >= 0)
|> yield(name: "ChangeInKilometers")
Thanks. Your query gives:
table _value _time
0 0 2022-08-13T14:15:15.366Z
0 104 2022-08-14T10:01:26.947Z
0 3 2022-08-15T03:52:06.659Z
0 0 2022-08-20T08:21:56.017Z
0 48 2022-09-08T03:29:43.150Z
0 9 2022-09-11T05:25:16.556Z
0 0 2022-09-19T16:10:27.721Z
So it displays the difference (but it excluded the first line 2022-08-11); now how can I correlate this to the original results?
EDIT: It doesn’t seem also to pick up the correct _time
.
grant1
January 28, 2023, 6:35pm
4
Let’s try this (I did not check the syntax of the map() function, so an error may pop up). You should see a new column called ChangeInKilometers as well as the original column.
from(bucket: "homeassistant")
|> range(start: 2022-08-10T00:00:00Z)
|> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
|> filter(fn: (r) => r["_field"] == "value")
|> drop(columns: ["_measurement", "_field", "_start",
"_stop", "domain", "entity_id", "friendly_name"])
|> map(fn: (r) => ({r with ChangeInKilometers: (difference(v: r._value))}))
|> filter(fn: (r) => r["ChangeInKilometers"] >== 0)
|> yield(name: "ChangeInKilometers")
It complains of an error on the line:
|> map(fn: (r) => ({r with ChangeInKilometers: (difference(v: r._value))}))
The error is:
error @7:65-7:73: found unexpected argument v
error @7:51-7:74: missing required argument tables
grant1
January 28, 2023, 11:22pm
6
Yep, sorry about that. Still learning all this Flux stuff myself. Try this:
from(bucket: "homeassistant")
|> range(start: 2022-08-10T00:00:00Z)
|> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
|> filter(fn: (r) => r["_field"] == "value")
|> drop(columns: ["_measurement", "_field", "_start",
"_stop", "domain", "entity_id", "friendly_name"])
|> duplicate(column: "_value", as: "OriginalValue")
|> difference()
|> rename(columns: {_value: "Difference"})
|> filter(fn: (r) => r["Difference"] > 0)
|> yield(name: "ChangeInKilometers")
I’m away from my home (and Influxdb instance) now, so you should settle for a picture as Influxdb is not very mobile-friendly.
However it doesn’t seem the query gives the correct results… It skips so much data?
grant1
January 29, 2023, 1:32pm
8
The above is what is causing the some results to get filtered out. Delete it and then run again and see how the data looks, then we’ll go from there…
grant1
January 29, 2023, 2:01pm
9
OK, I think this will work…
from(bucket: "homeassistant")
|> range(start: 2022-08-10T00:00:00Z)
|> filter(fn: (r) => r["entity_id"] == "previous_car_charge_distance")
|> filter(fn: (r) => r["_field"] == "value")
|> drop(columns: ["_measurement", "_field", "_start",
"_stop", "domain", "entity_id", "friendly_name"])
|> duplicate(column: "_value", as: "diff")
|> difference(
columns: ["diff"],
keepFirst: true
)
|> filter(fn: (r) => not exists r.diff or r.diff != 0)
|> yield(name: "ChangeInKilometers")
Borrowed mostly from here .
1 Like
Yes! You’ve done it!
So basically I had to duplicate the column and apply difference
only to that column and then filter it out. Clear! Thank you very much for your help!
So it is not necessary to sort
explicitly?
grant1
January 29, 2023, 8:52pm
12
Glad it’s working.
I do not think the sort()
by time function is required, since your data already appears to be sorted by time.
1 Like
Hi @grant1 ,
I think I have a similar issue and hope that you might be able to help with that too, so I will post it here:
Hi,
let’s say I have the following values stored:
0
120.50
124.90
124.40
0 <-- Should be removed
119.0
187.99
-1.54 <-- Should be removed
122.54
0
I want to filter out all rows with a value that is <= 0 that are also between rows with a value > 0. So in this case I would want the following result:
0
120.50
124.90
124.40
119.0
187.99
122.54
0
So if there is a row with a value of <=0 and the rows that comes directly before AND after that row are > 0, the row that is 0 should be re…
Thank you!