Query efficiency for GROUP BY over large time ranges

JeremySTX · October 30, 2017, 7:58am

Our database stores min/max/avg values at half-hourly time intervals.
The data is typically an accumulator of some sort, e.g. kilowatt-hours of energy used this year.
We have a requirement to be able to show the kWH per day.
I’m tempted to write a quiery like

SELECT min(min_kilowatt_hours) FROM .. WHERE time>='2017-10-01' and time<'2017-11-01' GROUP BY time(1d)
SELECT max(max_kilowatt_hours) FROM ... WHERE time>'2017-10-01' and time<='2017-11-01' GROUP BY time(1d)

then subtract the two to get the kWH in each day.

What I would like to know is, how efficient is the Influx engine when it has to trawl through many measurement data points to find the lowest or highest value?
Would I be better off having my application iterating over the date range, generating a separate query for each day (rather than relying on GROUP BY)?

Alternatively, is the FIRST() function with a GROUP BY an efficient compromise between the two? I am assuming that FIRST() with GROUP BY causes the InfluxDB engine to read measurements starting at the first GROUP BY interval and as soon as it has a measurement in that interval it skips ahead to the start of the next interval?

Topic		Replies	Views
Group by time - why different grouping for mean() and integral() Telegraf	6	1727	April 14, 2021
Troubleshooting expensive query	1	31	August 22, 2024
Group by time() query influxdb , influxql	3	28973	August 18, 2019
Aggregation per hour Fluxlang influxql	4	4730	November 12, 2021
Influxdb group by from most recent time influxdb	0	644	January 27, 2018

Query efficiency for GROUP BY over large time ranges

Related topics