Hey @tbalzer !
We’re actually doing something very similar to what you describe to power these status lights in Chronograf:
The specific query that we’re running is:
select non_negative_derivative(mean(uptime)) as deltaUptime from "system" where time > now() - 10m group by host, time(1m) fill(0)
The idea is to get the rate that uptime is changing. If the rate is greater than zero, the server is continuing to report changes to its uptime and is therefore up. If it’s zero, there haven’t been any changes reported in the 10m period we asked for, so we change the light to amber to indicate that the server may be down. Finally, if the value isn’t present at all, it means the last reported change to uptime was outside a 10m window, so we change the light to red to indicate the server is down. This all happens here: https://github.com/influxdata/chronograf/blob/master/ui/src/hosts/components/HostsTable.js#L152-L159
We could get this as a percentage over the 10m window using subqueries (in v1.2.0+). I think this should do the trick:
select sum("isUp") / count("isUp") from (select non_negative_derivative(mean("uptime")) / non_negative_derivative(mean("uptime")) as isUp from system where time > now() - 10m group by time(1m) fill(0));
The idea here is to take the rate uptime is changing, divide it by itself to get a 1 or 0 signal (InfluxDB can’t do NaNs so replaces them with 0s). Then we take the sum over that window divided by the count to get a percentage uptime.
I think this is actually a really useful statistic for the hosts page in Chronograf, so I’ve created a new issue here: Support Uptime Percentage in Hosts List · Issue #1025 · influxdata/chronograf · GitHub .