Try to aggregate data up to yesterday evening WITHOUT today's value

neuvidor · April 1, 2022, 4:57pm

Hello everybody,

I have some data stored each minute in a bucket (room temperatures). No issue with that.
I’m trying to display for one of this temperature the average of each previous weeks during the 2 previous month of the current day. The behavior will be moving weeks average, current day not included.

The issue I have is regarding the range selection, indeed, I did not succeed to delete the current day data causing bad chart.

image694×388 33.9 KB

Below what I have already tried without success.

Query tried #01 :

You can see that I’m getting the current day data additionnaly to the previous aggregation which is not acceptable.
image1128×821 59.2 KB
The corresponding query :

import "date"
import "experimental"
import "math"

DAY_STOP = today()
DAY_START = experimental.addDuration(d: -2mo, to: DAY_STOP)

from(bucket: "home1_daily_data")
  |> range(start: DAY_START, stop: DAY_STOP)
  |> filter(fn: (r) => r["_measurement"] == "kWh")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "energie")
  |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
  |> filter(fn: (r) => r["instance"] == "test")
  |> filter(fn: (r) => r["source"] == "calculated")
  |> aggregateWindow(every: 1w, fn: mean)

Query tried #02 :

You can see that I’m getting 2 values for the same day which is also not acceptable
image1159×847 66.3 KB
The corresponding query :

import "date"
import "experimental"
import "math"

CURRENT_DAY = today()
DAY_STOP = experimental.addDuration(d: -1s, to: CURRENT_DAY)
DAY_START = experimental.addDuration(d: -2mo, to: DAY_STOP)

from(bucket: "home1_daily_data")
  |> range(start: DAY_START, stop: DAY_STOP)
  |> filter(fn: (r) => r["_measurement"] == "kWh")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "energie")
  |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
  |> filter(fn: (r) => r["instance"] == "test")
  |> filter(fn: (r) => r["source"] == "calculated")
  |> aggregateWindow(every: 1w, fn: mean)

Expected result if we are the 01/04/2022

image1158×501 44.8 KB

Anaisdg · April 1, 2022, 8:26pm

Hello @neuvidor,
Are you just sharing your solution? Or do you still have a question? It looks like you got what you were looking for?
Although would think if you didn’t want to include data from today I would do:

start: my_start, stop: today()

And I would expect just that to work.

neuvidor · April 1, 2022, 9:18pm

Hello @Anaisdg ,

I still have the question because your proposal does not work as expected.

Your advised query :

from(bucket: "home1_daily_data")
  |> range(start: -2mo, stop: today())
  |> filter(fn: (r) => r["_measurement"] == "kWh")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "energie")
  |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
  |> filter(fn: (r) => r["instance"] == "test")
  |> filter(fn: (r) => r["source"] == "calculated")
  |> aggregateWindow(every: 1w, fn: mean)

Query’s result :

Today’s data is still present which is not good for my chart
image1090×755 58.1 KB

scott · April 5, 2022, 3:31pm

@neuvidor today() is based on UTC time, so you may need to apply an offset to the timestamp returned from today() to account for your timezone. For example, if you’re in the US EDT timezone, you’d need to offset the timestamp by -4h. To do this, use experimental.addDuration():

import "experimental"

from(bucket: "home1_daily_data")
     |> range(start: -2mo, stop: experimental.addDuration(d: -4h, to: today()))
     |> filter(fn: (r) => r["_measurement"] == "kWh")
     |> filter(fn: (r) => r["_field"] == "value")
     |> filter(fn: (r) => r["domain"] == "energie")
     |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
     |> filter(fn: (r) => r["instance"] == "test")
     |> filter(fn: (r) => r["source"] == "calculated")
     |> aggregateWindow(every: 1w, fn: mean)

neuvidor · April 6, 2022, 3:52pm

@scott ,

Thanks for your proposal even if I always use UTC format as you can see below.

Unfortunately your proposal has the same behavior of my previous examples. You can notice that the value in the purple circle is not OK because there is no 7 days compared to the previous value (see red mark).

Below the response of your querry proposal :

image1912×937 114 KB

scott · April 6, 2022, 4:09pm

Interesting. Ok, what are the last few rows of your data before aggregateWindow()? To get them, use tail():

import "experimental"

from(bucket: "home1_daily_data")
    |> range(start: -2mo, stop: experimental.addDuration(d: -4h, to: today()))
    |> filter(fn: (r) => r["_measurement"] == "kWh")
    |> filter(fn: (r) => r["_field"] == "value")
    |> filter(fn: (r) => r["domain"] == "energie")
    |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
    |> filter(fn: (r) => r["instance"] == "test")
    |> filter(fn: (r) => r["source"] == "calculated")
    |> tail(n: 10, offset: 0)

I’m curious to see what the timestamps on those last few rows are.

neuvidor · April 6, 2022, 4:41pm

I can directly answer you, there is only 1 value per day stored. There is a mistake on my original description above, I confused with my “home1_raw_data” bucket wich have a value stored every minute but with only 30 days of retention.

Below what I got :

image1917×934 100 KB

scott · April 6, 2022, 8:38pm

I think I know what’s going on. I think it’s because you’re querying by calendar month and aggregating by week. When you use every: 1w, Flux uses the Unix epoch as the starting point to increment and define weeks. The unix epoch started on a Thursday, so all 1w windows represent Thursday to Wednesday. aggregateWindow(), by default, uses the _stop time of the window as the new _time value of the output aggregate point. So while the queried data only contains points from last month, one or more of those points falls in a 1w window that overflows into the next month and you get a _time value from the next month.

Because you’re querying by calendar month, the length of the time range can vary and may not divide well by the every parameter. This may leave a remainder of rows that will “overflow” into a window that gets assigned a timestamp outside of the range. I’m not 100% sure.

I think the first thing you should try is to aggregate every 7d instead of one week, just to avoid the weirdness involved in with querying calendar weeks based on the Unix epoch. If that still results in this extra row, update the query time range to be a multiple of the every value. For example, range(start: -63d).

This is only a guess, but it makes sense in my head

neuvidor · April 15, 2022, 9:19pm

Hello @scott !

Your explanation is really interresting, thanks for that ! It seems that your are right regarding thursday, this is a key day for flux which is a bad news for me…

I tried your proposal but unfortunalety I got the same behavior.

So the new question is :

Is there a way to aggregate some data starting from the “stop date” to the “start date”, like a moving average based on the last 7 days without taking in account the current day ?

Topic		Replies	Views
aggregateWindow for incomplete timeframes InfluxDB 2 influxdb	2	571	April 19, 2021
Result interpretation - Strange aggregate InfluxDB 2 time , query , flux	4	1083	December 21, 2021
Query to generate windowed aggregates with timestamp at window start and without partial data Fluxlang query , flux	2	1163	May 11, 2020
Total Consumption summed up Last 7 days (not working) InfluxDB 2 influxdb , grafana , flux , downsample , aggregate	2	1044	October 23, 2022
Flux: aggregating with a monthly window shows the wrong month Fluxlang time-series , flux	3	819	March 24, 2023