I’m very happy with the usability of the TICK stack, but I am missing one very vital feature which is what I’d call a “server availability percentage value”. What I mean with that is, I would like to have an automatically calculated value for a given timerange which is the percentage between the time the server was available and unavailable. Strictly speaking, I personally would need this for the servers network availability, but so far I have not found a way to achieve this, so my attempt so far is to use the telegraf system plugins “uptime” value and compare it to the duration value of chronograf’s alert database. On a sidenote: which unit does the duration value use? I already have a few alerts logged and their duration values are sometimes huge. It seems like the telegraf uptime plugin calculates time in seconds, could it be that chronograf’s alerts duration are measured in miliseconds?
As I haven’t found that feature in chronograf yet, I tried to do a mixed query in Grafana, with A being the server uptime value and B being the duration value of alerts. Is there a way to “compare” these two values as a single (percentage) value? I have so far not found a way to do this, as I always get the N/A result whatever I try. My Query looks like this:
Query1 (database: chronograf) SELECT count(“message”) FROM “alerts” WHERE “alertName” = ‘Deadman Netzausfall’ AND “level” = ‘CRITICAL’ AND “host” =~ /^$server$/ AND $timeFilter
Query2 (database: telegraf) SELECT “uptime” FROM “system” WHERE “host” =~ /^$server$/ AND $timeFilter
Deadman Netzausfall is my kapacitor alarm which notifies me if a server did not send any net_response data in the last few minutes.
Did anyone try or manage to achieve something similar?
Additionally, it would be absolutely great if I could create a table which lists the time-ranges the server was available/unavailable, but I guess that is asking for too much. A Percentage value would be a huge help already, e.g. “the server was 97% available and 3% unavailable”