Use alert counts from TICK scripts


#1

Hi all,

I have recently started using Kapacitor to replace our previous alerting tool. So far I am more than happy with it.

I have one question regarding alerts:
We are tracking a list of services. I want to raise an alert if a service is offline. I also want to raise an alert if the number of instances of this specific alert is above a certain threshold (i.e. more than THRESHOLD services are offline). So far, I wasn’t able to find anything about how to use alert statistics from a TICK script.

TICK script so far:

kapacitor show cpu_alert
ID: cpu_alert
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 12 Nov 18 14:30 UTC
Modified: 13 Nov 18 10:30 UTC
LastEnabled: 13 Nov 18 10:30 UTC
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
dbrp "telegraf"."autogen"

var ERROR_SERVICES = 'OFFLINE SERVICES High'
var THRESHOLD = 10

var data = stream
    |from()
        .measurement('docker')
    |groupBy('physnum')

var services = data
    |window()
        .period(1m)
        .every(1m)
    |max('n_containers_running')
        .as('running')
    |alert()
        .id(ERROR_SERVICES)
        .crit(lambda: int("running") < 2)
        .log('/tmp/alerts.log')

//missing something like:
//services
//   |count('critical')
//   |alert()
//   |crit(lambda: int('count') > THRESHOLD)

DOT:
digraph cpu_alert {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="0"];

from1 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
from1 -> groupby2 [processed="0"];

groupby2 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
groupby2 -> window3 [processed="0"];

window3 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
window3 -> max4 [processed="0"];

max4 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
max4 -> alert5 [processed="0"];

alert5 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="0s" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="0" ];
}

Any help would be much appreciated


#3

You can write each alert back to InfluxDB, and then alert separately on top of that measurement.