Hi all,
I have recently started using Kapacitor to replace our previous alerting tool. So far I am more than happy with it.
I have one question regarding alerts:
We are tracking a list of services. I want to raise an alert if a service is offline. I also want to raise an alert if the number of instances of this specific alert is above a certain threshold (i.e. more than THRESHOLD services are offline). So far, I wasn’t able to find anything about how to use alert statistics from a TICK script.
TICK script so far:
kapacitor show cpu_alert
ID: cpu_alert
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 12 Nov 18 14:30 UTC
Modified: 13 Nov 18 10:30 UTC
LastEnabled: 13 Nov 18 10:30 UTC
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
dbrp "telegraf"."autogen"
var ERROR_SERVICES = 'OFFLINE SERVICES High'
var THRESHOLD = 10
var data = stream
|from()
.measurement('docker')
|groupBy('physnum')
var services = data
|window()
.period(1m)
.every(1m)
|max('n_containers_running')
.as('running')
|alert()
.id(ERROR_SERVICES)
.crit(lambda: int("running") < 2)
.log('/tmp/alerts.log')
//missing something like:
//services
// |count('critical')
// |alert()
// |crit(lambda: int('count') > THRESHOLD)
DOT:
digraph cpu_alert {
graph [throughput="0.00 points/s"];
stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="0"];
from1 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
from1 -> groupby2 [processed="0"];
groupby2 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
groupby2 -> window3 [processed="0"];
window3 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
window3 -> max4 [processed="0"];
max4 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
max4 -> alert5 [processed="0"];
alert5 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="0s" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="0" ];
}
Any help would be much appreciated