I need to create a monitor to check the percentage of time the application is unavailable, not counting maintenance time.
In my tests, telegraf collects 2 metrics, Apache and status file
Apache - Apache is running, and telegraf checks its status.
/etc/telegraf/telegraf.d/telegraf-http.conf
[[inputs.http_response]]
urls = [ "https://localhost/server-status" ]
insecure_skip_verify = true
interval = "10s"
File - I created the file /opt/manutencao_apache to enable the “maintenance time”
/etc/telegraf/telegraf.d/telegraf-manutencao.conf
[[inputs.exec]]
commands = [ "/etc/telegraf/scripts/monitor-manutencao 'http_response' 'server=https://localhost/server-status' '/opt/manutencao_apache'" ]
data_format = "influx"
interval = "10s"
/etc/telegraf/scripts/monitor-manutencao
#!/usr/bin/env bash
file_check="${3}"
name="${1}"
tags="${2},${field_tag}"
if [[ -f "${file_check}" ]]; then
field="manutencao=-10"
field_tag="manutencao_tag=manutencao_ativada,result=manutencao_ativada"
name="${1}"
tags="${2},${field_tag}"
echo "${name},${tags} ${field}"
fi
Now I’m trying to create a query.
That’s the best I could do.
SELECT (count(total_manut_response) / count(total)) * 100 AS "success_percentage"
FROM
(
SELECT count(manut_response) AS "total_manut_response"
FROM
(
SELECT c_teste + m_teste AS "manut_response"
FROM
(
SELECT last("result_code") AS "c_teste", last("manutencao") AS "m_teste" FROM "http_response" WHERE host = 'machine1-local' AND server = 'https://localhost/server-status' AND ("result" = 'success' OR "manutencao_tag" = 'manutencao_ativada') AND $timeFilter GROUP BY time(1m) fill(null)
)
)
GROUP BY time(1m) fill(null)
),
(
SELECT last("result_code") AS total
FROM "http_response"
WHERE ("host" = 'machine1-local' AND "server" = 'https://localhost/server-status') AND $timeFilter GROUP BY time(1m) fill(null)
)
This almost works, the problem, its when the “maintenance time” returns null, all its null.
eg:
NOW
http check (ok) + maintenance = metrics (ok)
http check (ok) + no maintenance = no metrics (null)
http check (nok) + maintenance = no metrics (null)
http check (nok) + no maintenance = no metrics (null)
ideal scenario, it can only return null, when apache is down, and the maintenance file does not exist (/opt/manutencao apache)
http check (ok) + maintenance = metrics (ok)
http check (ok) + no maintenance = metrics (ok)
http check (nok) + maintenance = metrics (ok)
http check (nok) + no maintenance = no metrics (null)