Group by time - why different grouping for mean() and integral()

kstuken · April 11, 2021, 9:15pm

Hi, I am using InfluxDB 1.8.4. I have a time series of power measurements (about every minute) and want to convert those into energy measurements (ie (kilo)Watts per day). I have that figured out, but I am struggling a bit with the GROUP BY time behavior of InfluxDB.

I would like to get results grouped by days going backward from now(); ie if I run the query at 15:00 hours, I would like the last group item returned to cover from previous day 15:01 to today 15:00.

The following query using mean() behaves just like that:
SELECT (mean("value") * 24) AS "value" FROM <myTable> WHERE time > now() - 30d GROUP BY time(1d) fill(null)

If, instead, I use integral()–which I need for proper results–the grouping is different. Rather than grouping by 24h blocks going backward, as desired, the grouping is by calendar days, eg the last group item returned will cover 00:00 to 15:00 (and the first item in the group will be similarly cut off):
SELECT (integral("value", 1h)) AS "value" FROM <myTable> WHERE time > now() - 30d GROUP BY time(1d) fill(null)

Does anyone have a suggestion how I could achieve the desired grouping behavior, ie the same behavior as for the query with mean()?

Pooh · April 11, 2021, 9:48pm

Hi, I am using InfluxDB 1.8.4. I have a time series of power measurements
(about every minute) and want to convert those into energy measurements
(ie (kilo)Watts per day). I have that figured out, but I am struggling a
bit with the GROUP BY time behavior of InfluxDB.

Energy is power times time - so WattSeconds, or kiloWattHours, or WattDays,
not Watts per Day.

If, instead, I use integral()–which I need for proper results–the
grouping is different. Rather than grouping by 24h blocks going backward,
as desired, the grouping is by calendar days, eg the last group item
returned will cover 00:00 to 15:00 (and the first item in the group will
be similarly cut off): SELECT (integral("value", 1h)) AS "value" FROM <myTable> WHERE time > now() - 30d GROUP BY time(1d) fill(null)

Does anyone have a suggestion how I could achieve the desired grouping
behavior, ie the same behavior as for the query with mean()?

Insetad of GROUP BY time(1d) try simply GROUP BY time(24h).

Antony.

kstuken · April 11, 2021, 10:11pm

Thank you for your response.

You are right, of course. So what I want is kWh over an entire day. And again, that is what the query is giving me, no problem there.

Yes, I had tried that before. Makes no difference, unfortunately. Still not the desired grouping behavior using integral().

Pooh · April 11, 2021, 10:34pm

Oh dear. Sorry, I thought that would group the data as you wanted.

I hope someone else has a better idea than mine.

Antony.

Anaisdg · April 12, 2021, 6:17pm

Hello @kstuken,
That sounds like a bug? Can you file an issue? I’ll also share with the InfluxDB team.
Thank you

kstuken · April 13, 2021, 11:11pm

Thank you, am considering it. But looking a bit deeper at this, there seems to be a number of issues at play here. No 3) below looks like a straightforward bug to me. No 2) also looks buggy, or at least very non-intuitive…

Three things I figured out since my initial post:

1) I realized that the grouping behavior is actually consistent across aggregation functions, ie integral(), mean() and count() all cut off at the end of the calendar day. (Eg right now, as local time crossed from just before to just past midnight, the last value in the result set from both integral() and count() grouping fell down to effectively zero and will now increase as the day progresses.) In my initial issue description above I had thought that mean() behaved differently because I multiplied the mean query by *24, which gave me reasonable-looking data as soon as there was a little bit of data for the current day.

2) Still, at minimum this looks like very counter-intuitive behavior. What do I have to do to get time grouping of continuous 24h blocks that are not aligned with calendar day breaks? It gets weirder when I try this:
SELECT (count("value")) AS "value" FROM "<myTable>" WHERE time > now() - 690h GROUP BY time(23h) fill(null) tz('Europe/Berlin')
In my view that should give me a result set of 30 grouped values (690h / 23h). But no, 31 values in the result set, somehow the first and last values in the result set seem not be for full 23h intervals.

3) It becomes fully bizarre when timezone is involved.
SELECT (count("value")) AS "value" FROM "<myTable>" WHERE time > now() - 720h GROUP BY time(24h) fill(null) tz('Europe/Berlin')
When I run this query between midnight and 1am local time, rather than getting a result set of 30 values (ie 720h / 24h), I do get a result set of 32 values. That is just plain wrong. Even if the first and last result value are for significantly less than 24h, the remaining 30 values should be for the full 24h, and hence either the query is executed for way more than the 720h limit I set, or there is some data duplication here. (past 1am I am getting 31 results for that query).

kstuken · April 14, 2021, 10:00am

I have filed a bug here

github.com/influxdata/influxdb

Group by time - returning too many results

opened 09:59AM - 14 Apr 21 UTC

kstuken

Grouping time series data by hour-intervals will give weirdly grouped results … ### First example: __Steps to reproduce:__ `SELECT count("value") AS "value" FROM <myTable> WHERE time > now() - 720h GROUP BY time(24h) fill(null)` __Expected behavior:__ Return result set of 30 values, each for past 24h __Actual behavior:__ Returns result set of 31 values, with the cut-off for grouping happening at midnight, ie change of calendar day. This means the last value in the result set will be for the hours of the day so far, which is useless if I want to do a comparison of daily aggregate values (eg count() or integral()). Due to this behavior I would have to wait until then end of the current calendar day until the time series data for the current day meaningfully shows up in my query result. ### Second example: It gets pretty bizarre when time zones come into play: __Steps to reproduce:__ 1. `SELECT count("value") AS "value" FROM <myTable> WHERE time > now() - 720h GROUP BY time(24h) fill(null) tz('Europe/Berlin')` 2. Run between midnight and 1am local time (Europe/Berlin) __Expected behavior:__ Return result set of 30 values, each for past 24h __Actual behavior:__ Returns result set of _32 values_ . That seems just plain wrong. I suspect the fact that the period queried includes the shift from normal time to day lights savings time might play a role here. But again, having 32 results means I am getting results for a period exceeding the 720h I had requested (30 * full 24h grouping, 2 * less than 24h grouping) __Environment info:__ * System info: Linux 5.10.17-v7l+ armv7l * InfluxDB version: InfluxDB v1.8.4 (git: 1.8 bc8ec4384eed25436d31045f974bf39f3310fa3c)

Topic		Replies	Views
Understanding how influx deals with time for GROUP BY and INTEGRAL?	3	7897	April 8, 2019
Query efficiency for GROUP BY over large time ranges	0	654	October 30, 2017
Query Problem - including GROUP BY time() acts weird? Store influxdb , influxql	3	2260	May 4, 2017
Group by time() query influxdb , influxql	3	28973	August 18, 2019
Weird results with group by time() Store influxdb , influxql	8	1003	February 14, 2019

Group by time - why different grouping for mean() and integral()

Related topics