Try to aggregate data up to yesterday evening WITHOUT today's value

Hello everybody,

I have some data stored each minute in a bucket (room temperatures). No issue with that.
I’m trying to display for one of this temperature the average of each previous weeks during the 2 previous month of the current day. The behavior will be moving weeks average, current day not included.

The issue I have is regarding the range selection, indeed, I did not succeed to delete the current day data causing bad chart.

Below what I have already tried without success.

Query tried #01 :

  • You can see that I’m getting the current day data additionnaly to the previous aggregation which is not acceptable.
  • The corresponding query :
import "date"
import "experimental"
import "math"

DAY_STOP = today()
DAY_START = experimental.addDuration(d: -2mo, to: DAY_STOP)

from(bucket: "home1_daily_data")
  |> range(start: DAY_START, stop: DAY_STOP)
  |> filter(fn: (r) => r["_measurement"] == "kWh")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "energie")
  |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
  |> filter(fn: (r) => r["instance"] == "test")
  |> filter(fn: (r) => r["source"] == "calculated")
  |> aggregateWindow(every: 1w, fn: mean)

Query tried #02 :

  • You can see that I’m getting 2 values for the same day which is also not acceptable
  • The corresponding query :
import "date"
import "experimental"
import "math"

CURRENT_DAY = today()
DAY_STOP = experimental.addDuration(d: -1s, to: CURRENT_DAY)
DAY_START = experimental.addDuration(d: -2mo, to: DAY_STOP)

from(bucket: "home1_daily_data")
  |> range(start: DAY_START, stop: DAY_STOP)
  |> filter(fn: (r) => r["_measurement"] == "kWh")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "energie")
  |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
  |> filter(fn: (r) => r["instance"] == "test")
  |> filter(fn: (r) => r["source"] == "calculated")
  |> aggregateWindow(every: 1w, fn: mean)

Expected result if we are the 01/04/2022 :slight_smile:

Hello @neuvidor,
Are you just sharing your solution? Or do you still have a question? It looks like you got what you were looking for?
Although would think if you didn’t want to include data from today I would do:

start: my_start, stop: today() 

And I would expect just that to work.

Hello @Anaisdg ,

I still have the question because your proposal does not work as expected.

Your advised query :

from(bucket: "home1_daily_data")
  |> range(start: -2mo, stop: today())
  |> filter(fn: (r) => r["_measurement"] == "kWh")
  |> filter(fn: (r) => r["_field"] == "value")
  |> filter(fn: (r) => r["domain"] == "energie")
  |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
  |> filter(fn: (r) => r["instance"] == "test")
  |> filter(fn: (r) => r["source"] == "calculated")
  |> aggregateWindow(every: 1w, fn: mean)

Query’s result :

@neuvidor today() is based on UTC time, so you may need to apply an offset to the timestamp returned from today() to account for your timezone. For example, if you’re in the US EDT timezone, you’d need to offset the timestamp by -4h. To do this, use experimental.addDuration():

import "experimental"

from(bucket: "home1_daily_data")
     |> range(start: -2mo, stop: experimental.addDuration(d: -4h, to: today()))
     |> filter(fn: (r) => r["_measurement"] == "kWh")
     |> filter(fn: (r) => r["_field"] == "value")
     |> filter(fn: (r) => r["domain"] == "energie")
     |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
     |> filter(fn: (r) => r["instance"] == "test")
     |> filter(fn: (r) => r["source"] == "calculated")
     |> aggregateWindow(every: 1w, fn: mean)

@scott ,

Thanks for your proposal even if I always use UTC format as you can see below.

Unfortunately your proposal has the same behavior of my previous examples. You can notice that the value in the purple circle is not OK because there is no 7 days compared to the previous value (see red mark).

Below the response of your querry proposal :

Interesting. Ok, what are the last few rows of your data before aggregateWindow()? To get them, use tail():

import "experimental"

from(bucket: "home1_daily_data")
    |> range(start: -2mo, stop: experimental.addDuration(d: -4h, to: today()))
    |> filter(fn: (r) => r["_measurement"] == "kWh")
    |> filter(fn: (r) => r["_field"] == "value")
    |> filter(fn: (r) => r["domain"] == "energie")
    |> filter(fn: (r) => r["entity_id"] == "pompe_a_chaleur")
    |> filter(fn: (r) => r["instance"] == "test")
    |> filter(fn: (r) => r["source"] == "calculated")
    |> tail(n: 10, offset: 0)

I’m curious to see what the timestamps on those last few rows are.

I can directly answer you, there is only 1 value per day stored. There is a mistake on my original description above, I confused with my “home1_raw_data” bucket wich have a value stored every minute but with only 30 days of retention.

Below what I got :

I think I know what’s going on. I think it’s because you’re querying by calendar month and aggregating by week. When you use every: 1w, Flux uses the Unix epoch as the starting point to increment and define weeks. The unix epoch started on a Thursday, so all 1w windows represent Thursday to Wednesday. aggregateWindow(), by default, uses the _stop time of the window as the new _time value of the output aggregate point. So while the queried data only contains points from last month, one or more of those points falls in a 1w window that overflows into the next month and you get a _time value from the next month.

Because you’re querying by calendar month, the length of the time range can vary and may not divide well by the every parameter. This may leave a remainder of rows that will “overflow” into a window that gets assigned a timestamp outside of the range. I’m not 100% sure.

I think the first thing you should try is to aggregate every 7d instead of one week, just to avoid the weirdness involved in with querying calendar weeks based on the Unix epoch. If that still results in this extra row, update the query time range to be a multiple of the every value. For example, range(start: -63d).

This is only a guess, but it makes sense in my head :smile:

Hello @scott !

Your explanation is really interresting, thanks for that ! It seems that your are right regarding thursday, this is a key day for flux which is a bad news for me…

I tried your proposal but unfortunalety I got the same behavior.

So the new question is :

Is there a way to aggregate some data starting from the “stop date” to the “start date”, like a moving average based on the last 7 days without taking in account the current day ?