Month aggregateWindow doesn't work with negative offset

Hi,
we have an issue that aggregateWindow for month doesn’t work correctly with negative offset.
For zero offset it works fine, and for the following query
`

from(bucket: “my_bucket”) |> range(start: 2021-01-24T14:10:31+02:00, stop: 2021-12-24T14:10:31+02:00) |> filter(fn: (r) => r._measurement == “my_measurement”) |> filter(fn: (r) => r.machine == “123”) |> filter(fn: (r) => r._field == “span”) |> filter(fn: (r) => r.state == “working”) |> aggregateWindow(every: 1mo, offset: 0s, fn: sum, createEmpty: true ) |> yield(name: “sum”)

`
in the received result the _time column looks like this:

  • 2021-02-01T00:00:00.000Z
  • 2021-03-01T00:00:00.000Z
  • 2021-04-01T00:00:00.000Z
  • 2021-05-01T00:00:00.000Z
  • 2021-06-01T00:00:00.000Z
  • 2021-07-01T00:00:00.000Z
  • 2021-08-01T00:00:00.000Z
  • 2021-09-01T00:00:00.000Z
  • 2021-10-01T00:00:00.000Z
  • 2021-11-01T00:00:00.000Z
  • 2021-01-24T12:10:31.000Z

But when negative offset is used (we keep data in DB in UTC and shift them in query), the query is
`

from(bucket: “my_bucket”) |> range(start: 2021-01-24T14:10:31+02:00, stop: 2021-12-24T14:10:31+02:00) |> filter(fn: (r) => r._measurement == “my_measurement”) |> filter(fn: (r) => r.machine == “123”) |> filter(fn: (r) => r._field == “span”) |> filter(fn: (r) => r.state == “working”) |> aggregateWindow(every: 1mo, offset: -7200s, fn: sum, createEmpty: true ) |> yield(name: “sum”)

`

then the _time column in the received result looks broken:

  • 2021-01-31T22:00:00.000Z
  • 2021-02-28T22:00:00.000Z
  • 2021-03-28T22:00:00.000Z
  • 2021-04-30T22:00:00.000Z
  • 2021-05-30T22:00:00.000Z
  • 2021-06-30T22:00:00.000Z
  • 2021-07-30T22:00:00.000Z
  • 2021-08-31T22:00:00.000Z
  • 2021-09-30T22:00:00.000Z
  • 2021-10-30T22:00:00.000Z
  • 2021-11-30T22:00:00.000Z
  • 2021-12-24T14:10:31.000Z

What we want is to aggregate data by month with possibility to shift the data according to a certain timezone.
Are we doing something wrong?
Thank you

Used InfluxDB version is 2.0.7

Hello @ellibas,
I’m sorry for the delay. Sometimes questions get buried. In the future feel free to tag me, so I can make sure to get back to you!
I would suggest using timeshift() after aggregateWindow()
https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/built-in/transformations/timeshift/

Hi @Anaisdg, thank you for your reply.
I have tried to use timeshift() function in the following way (I used ‘7200s’ because in aggregateWindow I used offset ‘-7200s’):

|> timeShift(duration: 7200s, columns: ["_time"])

So the query now looks like this

from(bucket: “my_bucket”) |> range(start: 2021-01-24T14:10:31+02:00, stop: 2021-12-24T14:10:31+02:00) |> filter(fn: (r) => r._measurement == “my_measurement”) |> filter(fn: (r) => r.machine == “123”) |> filter(fn: (r) => r._field == “span”) |> filter(fn: (r) => r.state == “working”) |> aggregateWindow(every: 1mo, offset: -7200s, fn: sum, createEmpty: true ) |> timeShift(duration: 7200s, columns: ["_time"]) |> yield(name: “sum”)

The _time column in the received result looks different, but still not as expected:

  • 2021-02-01T00:00:00.000Z
  • 2021-03-01T00:00:00.000Z
  • 2021-03-29T00:00:00.000Z
  • 2021-05-01T00:00:00.000Z
  • 2021-05-31T00:00:00.000Z
  • 2021-07-01T00:00:00.000Z
  • 2021-07-31T00:00:00.000Z
  • 2021-09-01T00:00:00.000Z
  • 2021-10-01T00:00:00.000Z
  • 2021-10-31T00:00:00.000Z
  • 2021-12-01T00:00:00.000Z
  • 2021-12-24T14:10:31.000Z

Hello @ellibas,
Can you please export the data to annotated CSV so I can try to reproduce?

I wouldn’t use an offset in aggregate window.
I also don’t see offset as an option for aggregateWindow:
https://docs.influxdata.com/influxdb/cloud/reference/flux/stdlib/built-in/transformations/aggregates/aggregatewindow/
I was suggesting you use timeshift instead of offset.

I am not sure how I can export data, I see only the possibility to export the query results in InfluxDB UI.
Actually what we want is to shift data according to the timezone given by the user.

Example:
Let’s say we store measurements how long the machine is in the ‘blocked’ state. And we store these data in UTC time.
The machine became blocked on 31.01.2020 at 5:00, and was unblocked 02.02.2020 at 5:00. During this time period we have stored a lot of measurements in regular time intervals that machine is in the blocked state.
Then we want to retrieve data how long the machine was in the blocked state in time range: from 31.01.2020 00:00 till 03.03.2020 00:00.
In UTC timezone the result is
31.01.2020 - 19h
01.02.2020 - 24h
02.02.2020 - 5h
But machines are located in CET (+2h) and the user wants to get data in that timezone, so from his point of view the machine became blocked on 31.01.2020 at 7:00, and was unblocked 02.02.2020 at 7:00, and the result in that case will be
31.01.2020 - 17h
01.02.2020 - 24h
02.02.2020 - 7h

So this timeshift due to the timezone - is actually what we want to achieve.

That is interesting, because offset was an option for aggregateWindow when we developed the query few months ago, and you can try to set this parameter and see that it has the impact on the result…

Anyway I have tried to remove offset and add timeshift function, the _time column now looks much better, at least it is always at the correct end day of the month, but it doesn’t work as timezone shift. See my example above. The ‘sum’ data are the same in the result no matter what timeshift I use.

Hello @ellibas,
exporting the data from the UI is perfect.
I’m afraid I don’t understand the problem. The time zones don’t affect durations. How are you calculating the durations?
Have you tried using:
https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/built-in/transformations/stateduration/

hi @Anaisdg,
I am able to export only the query result, and in any case I am not able to upload it here because of some restrictions for “the new users” on this page.
Let me explain the problem again.
Yes, as you wrote time zones don’t affect durations, but only till the moment you want to “window” the data in some certain time buckets.
I will take another example: work shifts. Let’s say we want to measure how long people work every day.
And let’s say we want to measure work shifts of people working in Japan.
Let’s say they start at 8am every day and finish their shift at different time. But we store the data to DB in CET timezone, so according to the DB people in Japan start working at 11pm (of the previous day - because of the 9h time difference). So when we want to retrieve data, we want to shift this data with that 9h time difference due the different timezone, and only in that case we will get correct data for a certain day.

Or I can ask even easier - how to consider the timezone for retrieving data from InfluxDB? ‘offset’ parameter works good for aggregateWindow (even if it is not mentioned in the current documentation), but it fails for ‘month’

Hello @ellibas,
Thank you for explaining! I’m still a little confused though I apologize. timeShift() is used for that exact purpose. It will shift all of your timestamps back 9 hours to the previous day regardless of crossover between month or day.
I’m not sure why the offset isn’t working as expected. I’m not able to reproduce.
Can you please try the following:

|> timeShift(duration: -7200s, columns: ["_time"])
|> window(every:1mo)
|> sum() 

For example:

import "csv"
csvData = "#group,false,false,true,true,false,false,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#default,_result,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,tag1,tag2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field1,measurement,tagvalue1,tagvalue2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field1,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field2,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field2,measurement,tagvalue1,tagvalue2
"

data =  csv.from(csv: csvData)

data 
|> timeShift(duration: -1h)
|> window(every: 5m)
|> sum()

Works as expected for me.

hi @Anaisdg,
I wrote above that timeshift doesn’t work for me correctly: _time column looks good, but this timeshift duration has no impact on the _value column (the sum). I mean regardless of the set timeshift duration the _value remains the same.
Please try on your data set to query DB with the different timeshift duration and check if the _value column changes.

Again that example with work shifts in Japan: imagine that one person didn’t work on Monday and on Tuesday he worked 8 hours from 8am, but in DB (data stored in CET - 9 hours time difference) we will see data that he worked 1 hour on Monday and 7 hours on Tuesday. So we want consider the timezone in sum calculation.

Hello @ellibas,
The timeshift function shouldn’t have an affect on the sum. It also shifts the _start and _stop columns.
It sounds as though you just want to change the time in the range() function:

import "csv"
import "experimental"

csvData = "#group,false,false,true,true,false,false,true,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string,string
#default,_result,,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,tag1,tag2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field1,measurement,tagvalue1,tagvalue2
,,0,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field1,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:00:00Z,0,field2,measurement,tagvalue1,tagvalue2
,,1,2021-08-06T05:55:00Z,2021-08-06T06:10:00Z,2021-08-06T06:05:00Z,1,field2,measurement,tagvalue1,tagvalue2
"

data =  csv.from(csv: csvData)
|> range(start: experimental.addDuration(d: 1d, to: 2021-07-06T06:00:00Z), stop: now())
|> yield()

Notice how the first timestamp is at 2021-08-06T06:00:00Z but we’re able to add a day to the start time so we can still query for those timestamps.

I really appreciate your patience. I’m just still confused by your example though.
If you’re collecting the following data:

myMeasurement, myTag=foo myField=1.0 2021-08-13T08:00:00Z
myMeasurement, myTag=foo myField=1.0 2021-08-13T16:00:00Z

And you query with

|> from("mybucket")
|> range(start: 2021-08-13T08:00:00Z, stop: 2021-08-13T16:00:00Z)
|> sum

You’ll get a value of 2.0.

Now let’s say someone queries for that time range again, but for them the time was 1 hour on Monday and 7 hours on Tuesday, they’ll still get the same sum.

Can you please share what you expect the output to be? Thank you for trying to help me understand. Can you please provide:

  • input data with timestamps that the data was written.
  • query you want to execute.
  • expected outcome with data and timestamps.

Hello @ellibas,
The only other thing I can think of that might be related to what you’re asking is:

I encourage you to take a look at the date package:
https://docs.influxdata.com/flux/v0.x/stdlib/date/

Otherwise I look forward to you sharing your input data, query, and expected output.

@ellibas I recommend using the timeshift() function specifying to shift ONLY the _time column. By default it shifts the _start, _stop, and _time column. If you only shift the _time column you should see a difference in your sum after applying aggregateWindow() which operates on the _start and _stop times.

Yes, I have tried to use timeshift() function with specifying only _time column and without specifying the column at all, the result has no effect on sum. And I wonder that you propose to experiment further with the timeshift() function when you have already mentioned above that “The timeshift function shouldn’t have an affect on the sum”.
So we can finish with proposal to use timeshift() function, because what I want is the sum changes.

Anyway I see a little progress in your understanding the problem: what we want is to change the time in the range function. And offset parameter does exactly what we want (it shifts the time in the range), but it fails when month aggregate window is chosen (please check the beginning of the conversation again), for days, weeks, years it works fine.

The basic example for you to simulate the situation:
let’s say we have ‘state’ as tag and ‘span’ as value, ‘time’ is written in UTC.
The person starts to work in Japan at 8am on 07.08.2021 and finished at 4pm the same day.
Now let’s create some simplified pseudomesaurements for 8h workshift:
state=‘working’, span=‘60’, time=2021-08-06T23:00:00Z
state=‘working’, span=‘60’, time=2021-08-07T00:00:00Z
state=‘working’, span=‘60’, time=2021-08-07T01:00:00Z
state=‘working’, span=‘60’, time=2021-08-07T02:00:00Z
state=‘working’, span=‘60’, time=2021-08-07T03:00:00Z
state=‘working’, span=‘60’, time=2021-08-07T04:00:00Z
state=‘working’, span=‘60’, time=2021-08-07T05:00:00Z
state=‘working’, span=‘60’, time=2021-08-07T06:00:00Z

Now we want to know how long this person works daily (by usage sum() function on ‘span’ column).
If we calculate it without considering the timezone, then we’l get the result, that the person worked 60min (1 hour) on 06.08 and 420min(7 hours) on 07.08. But we want to consider the Japan timezone,
so we can see that person worked 8h on 07.08.
So my question is how we can do it?
We agreed that timeshift() function doesn’t suit for these purposes.
We used offset parameter of aggregateWindow() function, and it works exactly as we want for per day, per week and per year but fails for per month.

The timeShift() function does change the sum if you only specified the _time column in combination with the aggregateWindow() function.

import "array"

data = array.from(rows: [{_time: 2021-08-06T23:00:00Z, state: "working", span:60},
{_time: 2021-08-07T00:00:00Z, state: "working", span:60},
{_time: 2021-08-07T01:00:00Z, state: "working", span:60},
{_time: 2021-08-07T02:00:00Z, state: "working", span:60},
{_time: 2021-08-07T03:00:00Z, state: "working", span:60},
{_time: 2021-08-07T04:00:00Z, state: "working", span:60},
{_time: 2021-08-07T05:00:00Z, state: "working", span:60},
{_time: 2021-08-07T06:00:00Z, state: "working", span:60}])

data 
|> range(start: 2021-08-06T22:00:00Z, stop: 2021-08-07T07:00:00Z)
|> aggregateWindow(every: 1d, fn: sum, column: "span")
|> yield(name: "without shift")

data 
|> range(start: 2021-08-06T22:00:00Z, stop: 2021-08-07T07:00:00Z)
|> timeShift(duration: -9h, columns: ["_time"])
|> aggregateWindow(every: 1d, fn: sum, column: "span")
|> yield(name: "with shift")

Where the first yield gives:


And the second gives:

As you can see without timeShift we get 60 and 420 minutes for the sum. Then we use timeshift() and the sum for the day changes to 480.

If this isn’t what you’re looking for can you help me understand how it’s different from what you’ve asked for? The dummy input data was helpful but ideally to get the best assistance you should provide dummy input AND output data as well as the query you’d like to use.

hi @Anaisdg , thank you for the answer. I found that I used timeShift function after aggregateWindow, so it had no effect, after putting it before aggregateWindow I see the impact of this function to the result.
So it seems timeShift() is what we want.
I will test more different cases and let you know if there is some blocking issue.
Thank you for your help :slight_smile:

1 Like

Hi @Anaisdg,
So now we are able to consider TZ in our queries using timeShift() function but now we are struggling with the daylight saving time. Is there any solution for that in flux?
Thank you

Hello @ellibas,
You could use conditional query logic to apply additional or different timeShift() when daylight savings occurs.
https://docs.influxdata.com/influxdb/cloud/query-data/flux/conditional-logic/#conditionally-transform-column-values-with-map