Incorrect data when combining aggregateWindow() and fill(..., usePrevious: true)

Hello!

I am using InfluxDB 2 with Grafana to query and sum up readings from various power meters around our house. The basic query is this:

from(bucket: "Home")
  |> range(start: v.timeRangeStart, stop:v.timeRangeStop)
  |> filter(fn: (r) => r._measurement =~ /sonoff\.0\.Kueche_St_Links\.ENERGY_Power/
                   and r._field == "value"
                   and r.from =~ /mqtt|sonoff|influxdb/
            )
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true, timeSrc: "_start")
  |> fill(column: "_value", usePrevious: true)

The results will be displayed in a stacked graph in Grafana 9.0.2. But my issue is also reproducable in InfluxDB’s web UI. The sensors return a measurement every ten seconds.

I use aggregateWindow() because plotting this data in Grafana would overflow the graph with too many datapoints. I also use fill() because the sensors supply a new value only if it is different from the previous one, i.e. if there is a constant energy usage over a long time just the first measurement is actually saved. Putting fill() before the aggregation does not help with this issue.

But for sensors which have only very brief readings, like in the screenshot below, this concept fails, and I don’t know how to improve it. You can see the above query with (blue) and without (red) aggregateWindow() and fill() plotted in the Web UI, and Influx for some reasons misses out on the fact that the device is being switched off, and reports constant enery consumption of >1.5kW, which is obviously incorrect.

How can I combine (a) aggregation, (b) gap filling, and (c) not missing significant changes in value, when querying this measurement?

image

@Anaisdg, do you have an idea maybe?

Hello @Jens,
Hmm it seems that fill and aggregateWindow are working as expected. I would expect the mean over the raw data (red) to be essentially one straight line (blue).
You could change the window period to be smaller to try and better capture some of those peaks.
You could also:

  • filter function to filter for data that’s greater than 0 and plot those points
  • and filter for data that’s less than 0 and aggregate on that

You could also:

  • write the on periods to a new bucket with a task and query just the on periods/reduced data/only capture those events

Do any of these solutions sound like they’re in line with what you’re looking for?

Jenno @anaidsg, thank you for your answer!

How can the mean() over the red line be a straight line? There are only a few very brief periods >0, and if I understand aggregateWindow() correctly it always takes a chunk of values, calculates the “function” (mean in this case) for these values, then returns a single value. So it should at least approximately - depending on window size - retrace the original line.

The problem is that my instruments omit repeated values (any, not just 0), and if this leaves no values at all inside such an aggregate window, aggregateWindow() returns “null”, and fill() with usePrevious then copies the previoues value, which is formally correct, but physically nonsense.

Am I using the wrong functions for this task?

If your instruments omit repeated values, and “usePrevious” does exactly what
it says, but as you say, this is “formally correct but physically nonsense”,
then what would you expect the (guessed) values to be instead?

If your instruments occasionally do not provide values, what would you want to
have substituted in their place?

Oh, and just in case you are thinking “I would like to have the missing values
interpolated between the last known value and the next known value”, bear in
mind that at the time at which the missing value needs to be substituted by
something else, the “next known value” is not yet known, so that calculation
is impossible. You can only use historical, not future, data.

Antony.

Hi @Jens -

I’m a bit confused about the desired output, but referring to your red/blue pix above, is it correct that:

Sonoff is reporting power-on at the 1st red spike, power-off at the 2nd, power-on at the 3rd and off again at the 4th? If so, then would you want to see a “square wave” graphic like this?

image

1 Like

Yes.
Actually, the red and blue lines should at least look identical unless zoomed in.
However, when on, the device reports a value every 10 seconds, because the consumption has some noise (+/- a few watts).
So currently it is not possible to reduce data like this using aggregateWindow() because this misses some spikes and will then produce large errors, and it is not possible to plot data like this without reduction, because sometimes this data also has far too many consecutive points to plot when the device is switched on for a longer time, and I cannot say beforehand which happens when.
One possible solution would be to fill all NULL values with zeros BEFORE using aggregateWindow() so that the mean aggregator would eventually reach zero after a short usage spike.

1 Like

@Jens
I’m sorry I’m confused it looks like your data is essentially square waves already. You’re looking to apply aggregate window to reduce the number of data points but also preserve the spikes?
Can you maybe filter for when data is higher? and then apply an aggregateWindow to create empty values and fill with whatever value you want?
Something like:

import "experimental/array"

array.from(rows: [{foo: "bar", baz: 21.2, _time: 2022-07-30T00:00:00Z}, {foo: "bar", baz: 23.8, _time: 2022-07-31T00:00:00Z}])
|> range(start: 2022-07-29T00:00:00Z, stop: 2022-08-02T00:00:00Z)
|> aggregateWindow(every: 1d, fn: first, column: "baz")
|> group()
|> fill(value: 0)