Our database stores min/max/avg values at half-hourly time intervals.
The data is typically an accumulator of some sort, e.g. kilowatt-hours of energy used this year.
We have a requirement to be able to show the kWH per day.
I’m tempted to write a quiery like
SELECT min(min_kilowatt_hours) FROM .. WHERE time>='2017-10-01' and time<'2017-11-01' GROUP BY time(1d)
SELECT max(max_kilowatt_hours) FROM ... WHERE time>'2017-10-01' and time<='2017-11-01' GROUP BY time(1d)
then subtract the two to get the kWH in each day.
What I would like to know is, how efficient is the Influx engine when it has to trawl through many measurement data points to find the lowest or highest value?
Would I be better off having my application iterating over the date range, generating a separate query for each day (rather than relying on GROUP BY)?
Alternatively, is the FIRST() function with a GROUP BY an efficient compromise between the two? I am assuming that FIRST() with GROUP BY causes the InfluxDB engine to read measurements starting at the first GROUP BY interval and as soon as it has a measurement in that interval it skips ahead to the start of the next interval?