SELECT with subquery seems to ignore GROUP BY



I want to create a report which lists devices that have not sent data to InfluxDB for 48 hours.

In our database, the __saved field is used to store the timestamp at which the measurement was written to InfluxDB; the measurement’s InfluxDB time (timestamp) field is the time at which the measurement was recorded by the device. (Our devices will internally buffer measurements if they can’t forward them to the server immediately.) The __devhash tag is used to uniquely identify the device; furthermore, we have separate databases for each device ‘type’, mainly because the set of fields logged by a device tends to vary somewhat from device type to device type.

So I begin with a query to get the most recent timestamp from all devices. Rather than show the results for all device types I’ll show just one of them. (The sort -u is to remove all but one of the CSV “column name” lines.)

jeremy@devtest$ influx -precision rfc3339 -database solarmon  -format csv -execute "SELECT __saved FROM battery GROUP BY __devhash ORDER BY time DESC LIMIT 1" | sort -u

So far so good, but I don’t want to see the devices which have logged data in the past 2 days. In the battery data above, only one device (A3A4850704CA39F3C733BDAD64741B07) has logged data recently.

I thought a subquery would do the job, but the results are confusing … surely there should be just one result per __devhash ?

jeremy@devtest$ influx -precision rfc3339 -database solarmon  -format csv -execute "SELECT * FROM (SELECT __saved FROM battery GROUP BY __devhash ORDER BY time DESC LIMIT 1) WHERE time<1494340200s"

The battery that is not of interest (i.e. the one with recent data, that I want excluded from the listing) is shown three times; the others are listed only once each and the correct measurement timestamp is shown for each. What’s going on?

Jeremy Begg


Minor correction to the description (above): we have separate measurements for each device ‘type’ (not separate databases for each, as I originally wrote).