Hi,
The documentation of window()
and aggregateWindow()
are not very clear on the topic, but by the examples, confirmed by trial, the selection of rows in each window uses a condition _start <= _time < _stop
.
At the same time, the default for timeSrc
is "_stop"
, which leaves out any sample that has a time stamp actually matching the resulting value for the aggregated data.
This combination results in problematic and quite unexpected behavior. The selection rule implies that each sample represents a value that is related to a time period after the _time
value, but the aggregate produces data in which the samples are related to a period of time time before the _time
value.
This results in some funky behavior in certain use cases:
- If an aggregate happens to use a
every
parameter that already matches the sampling time of the data, the result is a shift of one time step in the data, while a developer may expect no change at all. - When a time series is windowed, and then windowed again, the period of time drifts strangely away from the time stamps in the
_time
column. For example, if one has data in 30 s intervals and then takes the sum in 5 minute windows and then the mean of those in 1 hour windows, the data that actually gets put into the resulting 1 hour windows is from samples with minute parts from -5:00 to 54:30.
This doesn’t happen if one chooses the timeSrc: "_start"
option, in which cases the sample selection and the output timestamp match each other…
Is there some hidden way to change the selection rule to _start < _time <= _stop
? This would be necessary, for example when processing data that is already an aggregate of some window into the past, as produced by many kinds of measurement hardware. The only way that I could think of is a horrible kludge that uses a rule _start <= _time - eps < _stop
.
|> aggregateWindow(every: 5m, offset: 1ms, fn: mean, createEmpty: false)
|> timeShift(columns: ["_time"], duration: -1ms)