Month aggregateWindow doesn't work with negative offset

Hi,
we have an issue that aggregateWindow for month doesn’t work correctly with negative offset.
For zero offset it works fine, and for the following query
`

from(bucket: “my_bucket”) |> range(start: 2021-01-24T14:10:31+02:00, stop: 2021-12-24T14:10:31+02:00) |> filter(fn: (r) => r._measurement == “my_measurement”) |> filter(fn: (r) => r.machine == “123”) |> filter(fn: (r) => r._field == “span”) |> filter(fn: (r) => r.state == “working”) |> aggregateWindow(every: 1mo, offset: 0s, fn: sum, createEmpty: true ) |> yield(name: “sum”)

`
in the received result the _time column looks like this:

  • 2021-02-01T00:00:00.000Z
  • 2021-03-01T00:00:00.000Z
  • 2021-04-01T00:00:00.000Z
  • 2021-05-01T00:00:00.000Z
  • 2021-06-01T00:00:00.000Z
  • 2021-07-01T00:00:00.000Z
  • 2021-08-01T00:00:00.000Z
  • 2021-09-01T00:00:00.000Z
  • 2021-10-01T00:00:00.000Z
  • 2021-11-01T00:00:00.000Z
  • 2021-01-24T12:10:31.000Z

But when negative offset is used (we keep data in DB in UTC and shift them in query), the query is
`

from(bucket: “my_bucket”) |> range(start: 2021-01-24T14:10:31+02:00, stop: 2021-12-24T14:10:31+02:00) |> filter(fn: (r) => r._measurement == “my_measurement”) |> filter(fn: (r) => r.machine == “123”) |> filter(fn: (r) => r._field == “span”) |> filter(fn: (r) => r.state == “working”) |> aggregateWindow(every: 1mo, offset: -7200s, fn: sum, createEmpty: true ) |> yield(name: “sum”)

`

then the _time column in the received result looks broken:

  • 2021-01-31T22:00:00.000Z
  • 2021-02-28T22:00:00.000Z
  • 2021-03-28T22:00:00.000Z
  • 2021-04-30T22:00:00.000Z
  • 2021-05-30T22:00:00.000Z
  • 2021-06-30T22:00:00.000Z
  • 2021-07-30T22:00:00.000Z
  • 2021-08-31T22:00:00.000Z
  • 2021-09-30T22:00:00.000Z
  • 2021-10-30T22:00:00.000Z
  • 2021-11-30T22:00:00.000Z
  • 2021-12-24T14:10:31.000Z

What we want is to aggregate data by month with possibility to shift the data according to a certain timezone.
Are we doing something wrong?
Thank you

Used InfluxDB version is 2.0.7

Hello @ellibas,
I’m sorry for the delay. Sometimes questions get buried. In the future feel free to tag me, so I can make sure to get back to you!
I would suggest using timeshift() after aggregateWindow()
https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/built-in/transformations/timeshift/

Hi @Anaisdg, thank you for your reply.
I have tried to use timeshift() function in the following way (I used ‘7200s’ because in aggregateWindow I used offset ‘-7200s’):

|> timeShift(duration: 7200s, columns: ["_time"])

So the query now looks like this

from(bucket: “my_bucket”) |> range(start: 2021-01-24T14:10:31+02:00, stop: 2021-12-24T14:10:31+02:00) |> filter(fn: (r) => r._measurement == “my_measurement”) |> filter(fn: (r) => r.machine == “123”) |> filter(fn: (r) => r._field == “span”) |> filter(fn: (r) => r.state == “working”) |> aggregateWindow(every: 1mo, offset: -7200s, fn: sum, createEmpty: true ) |> timeShift(duration: 7200s, columns: ["_time"]) |> yield(name: “sum”)

The _time column in the received result looks different, but still not as expected:

  • 2021-02-01T00:00:00.000Z
  • 2021-03-01T00:00:00.000Z
  • 2021-03-29T00:00:00.000Z
  • 2021-05-01T00:00:00.000Z
  • 2021-05-31T00:00:00.000Z
  • 2021-07-01T00:00:00.000Z
  • 2021-07-31T00:00:00.000Z
  • 2021-09-01T00:00:00.000Z
  • 2021-10-01T00:00:00.000Z
  • 2021-10-31T00:00:00.000Z
  • 2021-12-01T00:00:00.000Z
  • 2021-12-24T14:10:31.000Z

Hello @ellibas,
Can you please export the data to annotated CSV so I can try to reproduce?

I wouldn’t use an offset in aggregate window.
I also don’t see offset as an option for aggregateWindow:
https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/built-in/transformations/aggregates/aggregatewindow/
I was suggesting you use timeshift instead of offset.

I am not sure how I can export data, I see only the possibility to export the query results in InfluxDB UI.
Actually what we want is to shift data according to the timezone given by the user.

Example:
Let’s say we store measurements how long the machine is in the ‘blocked’ state. And we store these data in UTC time.
The machine became blocked on 31.01.2020 at 5:00, and was unblocked 02.02.2020 at 5:00. During this time period we have stored a lot of measurements in regular time intervals that machine is in the blocked state.
Then we want to retrieve data how long the machine was in the blocked state in time range: from 31.01.2020 00:00 till 03.03.2020 00:00.
In UTC timezone the result is
31.01.2020 - 19h
01.02.2020 - 24h
02.02.2020 - 5h
But machines are located in CET (+2h) and the user wants to get data in that timezone, so from his point of view the machine became blocked on 31.01.2020 at 7:00, and was unblocked 02.02.2020 at 7:00, and the result in that case will be
31.01.2020 - 17h
01.02.2020 - 24h
02.02.2020 - 7h

So this timeshift due to the timezone - is actually what we want to achieve.

That is interesting, because offset was an option for aggregateWindow when we developed the query few months ago, and you can try to set this parameter and see that it has the impact on the result…

Anyway I have tried to remove offset and add timeshift function, the _time column now looks much better, at least it is always at the correct end day of the month, but it doesn’t work as timezone shift. See my example above. The ‘sum’ data are the same in the result no matter what timeshift I use.

Hello @ellibas,
exporting the data from the UI is perfect.
I’m afraid I don’t understand the problem. The time zones don’t affect durations. How are you calculating the durations?
Have you tried using:
https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/built-in/transformations/stateduration/

hi @Anaisdg,
I am able to export only the query result, and in any case I am not able to upload it here because of some restrictions for “the new users” on this page.
Let me explain the problem again.
Yes, as you wrote time zones don’t affect durations, but only till the moment you want to “window” the data in some certain time buckets.
I will take another example: work shifts. Let’s say we want to measure how long people work every day.
And let’s say we want to measure work shifts of people working in Japan.
Let’s say they start at 8am every day and finish their shift at different time. But we store the data to DB in CET timezone, so according to the DB people in Japan start working at 11pm (of the previous day - because of the 9h time difference). So when we want to retrieve data, we want to shift this data with that 9h time difference due the different timezone, and only in that case we will get correct data for a certain day.

Or I can ask even easier - how to consider the timezone for retrieving data from InfluxDB? ‘offset’ parameter works good for aggregateWindow (even if it is not mentioned in the current documentation), but it fails for ‘month’

Hello @ellibas,
Thank you for explaining! I’m still a little confused though I apologize. timeShift() is used for that exact purpose. It will shift all of your timestamps back 9 hours to the previous day regardless of crossover between month or day.
I’m not sure why the offset isn’t working as expected. I’m not able to reproduce.
Can you please try the following:

|> timeShift(duration: -7200s, columns: ["_time"])
|> window(every:1mo)
|> sum() 

For example:

import "csv"
csvData = "#group,false,false,true,true,false,false,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#default,_result,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,tag1,tag2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field1,measurement,tagvalue1,tagvalue2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field1,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field2,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field2,measurement,tagvalue1,tagvalue2
"

data =  csv.from(csv: csvData)

data 
|> timeShift(duration: -1h)
|> window(every: 5m)
|> sum()

Works as expected for me.

hi @Anaisdg,
I wrote above that timeshift doesn’t work for me correctly: _time column looks good, but this timeshift duration has no impact on the _value column (the sum). I mean regardless of the set timeshift duration the _value remains the same.
Please try on your data set to query DB with the different timeshift duration and check if the _value column changes.

Again that example with work shifts in Japan: imagine that one person didn’t work on Monday and on Tuesday he worked 8 hours from 8am, but in DB (data stored in CET - 9 hours time difference) we will see data that he worked 1 hour on Monday and 7 hours on Tuesday. So we want consider the timezone in sum calculation.

Hello @ellibas,
The timeshift function shouldn’t have an affect on the sum. It also shifts the _start and _stop columns.
It sounds as though you just want to change the time in the range() function:

import "csv"
import "experimental"

csvData = "#group,false,false,true,true,false,false,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#default,_result,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,tag1,tag2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field1,measurement,tagvalue1,tagvalue2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field1,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field2,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field2,measurement,tagvalue1,tagvalue2
"

data =  csv.from(csv: csvData)
|> range(start: experimental.addDuration(d: 1d, to: 2021-07-06T06:00:00Z), stop: now())
|> yield()

Notice how the first timestamp is at 2021-08-06T06:00:00Z but we’re able to add a day to the start time so we can still query for those timestamps.

I really appreciate your patience. I’m just still confused by your example though.
If you’re collecting the following data:

myMeasurement, myTag=foo myField=1.0 2021-08-13T08:00:00Z
myMeasurement, myTag=foo myField=1.0 2021-08-13T16:00:00Z

And you query with

|> from("mybucket")
|> range(start: 2021-08-13T08:00:00Z, stop: 2021-08-13T16:00:00Z)
|> sum

You’ll get a value of 2.0.

Now let’s say someone queries for that time range again, but for them the time was 1 hour on Monday and 7 hours on Tuesday, they’ll still get the same sum.

Can you please share what you expect the output to be? Thank you for trying to help me understand. Can you please provide:

  • input data with timestamps that the data was written.
  • query you want to execute.
  • expected outcome with data and timestamps.

Hello @ellibas,
The only other thing I can think of that might be related to what you’re asking is:

I encourage you to take a look at the date package:
https://docs.influxdata.com/flux/v0.x/stdlib/date/

Otherwise I look forward to you sharing your input data, query, and expected output.

@ellibas I recommend using the timeshift() function specifying to shift ONLY the _time column. By default it shifts the _start, _stop, and _time column. If you only shift the _time column you should see a difference in your sum after applying aggregateWindow() which operates on the _start and _stop times.