How to debug Kapacitor Alert?

kapacitor
#1

I created an alert through the web interface(Chronograf). The graph shows the intersection. But the alert does not work. How to debug?

Kapacitor 1.4.0 (git: HEAD fcce3ee9e6abcee5595fd61066bfc904edb1e113)

/ # kapacitor list tasks
ID Type Status Executing Databases and Retention Policies
chronograf-v1-708c588f-bf3b-4768-8577-d89ca31d3c1b stream enabled true [“telegraf”.“autogen”]
chronograf-v1-ad53ea27-4a65-4405-b37b-75e587f1ede0 stream enabled true [“telegraf”.“autogen”]
/ # kapacitor show chronograf-v1-708c588f-bf3b-4768-8577-d89ca31d3c1b
ID: chronograf-v1-708c588f-bf3b-4768-8577-d89ca31d3c1b
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 22 Feb 18 14:04 UTC
Modified: 26 Feb 18 10:40 UTC
LastEnabled: 26 Feb 18 10:40 UTC
Databases Retention Policies: [“telegraf”.“autogen”]
TICKscript:
var db = ‘telegraf’

var rp = ‘autogen’

var measurement = ‘mem’

var groupBy = []

var whereFilter = lambda: (“host” == ‘AutomatedTests’)

var name = ‘test’

var idVar = name + ‘:{{.Group}}’

var message = ’ {{.ID}} {{.Name}} {{.TaskName}} {{.Group}} {{.Tags}} {{.Level}} {{ index .Fields “value” }} {{.Time}}’

var idTag = ‘alertID’

var levelTag = ‘level’

var messageField = ‘message’

var durationField = ‘duration’

var outputDB = ‘chronograf’

var outputRP = ‘autogen’

var outputMeasurement = ‘alerts’

var triggerType = ‘threshold’

var crit = 15000000000

var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|eval(lambda: “used”)
.as(‘value’)

var trigger = data
|alert()
.crit(lambda: “value” > crit)
.stateChangesOnly()
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.telegram()
.chatId(‘216013926’)
.parseMode(‘mem > 15’)

trigger
|eval(lambda: float(“value”))
.as(‘value’)
.keep()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag(‘alertName’, name)
.tag(‘triggerType’, triggerType)

trigger
|httpOut(‘output’)

DOT:
digraph chronograf-v1-708c588f-bf3b-4768-8577-d89ca31d3c1b {
graph [throughput=“0.00 points/s”];

stream0 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
stream0 -> from1 [processed=“0”];

from1 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
from1 -> eval2 [processed=“0”];

eval2 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
eval2 -> alert3 [processed=“0”];

alert3 [alerts_triggered=“0” avg_exec_time_ns=“0s” crits_triggered=“0” errors=“0” infos_triggered=“0” oks_triggered=“0” warns_triggered=“0” working_cardinality=“0” ];
alert3 -> http_out6 [processed=“0”];
alert3 -> eval4 [processed=“0”];

http_out6 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];

eval4 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
eval4 -> influxdb_out5 [processed=“0”];

influxdb_out5 [avg_exec_time_ns=“0s” errors=“0” points_written=“0” working_cardinality=“0” write_errors=“0” ];
}
/ #

0 Likes

#2

Did you check the telegram config?

0 Likes

#3

Config is checked. Is it possible to force an alert?

0 Likes

#4

It seems that you are evaluating no points…Should be data passing through? Can you force to go over the crit value?

0 Likes

#5

Try changing the ‘Group By’ variable to group by host

var groupBy = [‘host’]

Heads up, if you are just starting out with the TICK stack then in order to use any of the tags you have you need to group by them as well. So if you wanted to include the host and information about the instance for example then:

var groupBy = [‘host’,‘instance’]

if you’re in linux you can use

sudo tail -f /path/to/kapacitor/logs | grep “Name of your alert”

You should be able to see an error in there. Does the host ‘AutomatedTests’ exist as a tag in the database?

0 Likes

#6

HI! I have a similar problem - I created an alert through the web interface and it doesn’t work.

var db = 'monitoring'
var rp = 'autogen'
var measurement = 'system'
var groupBy = ['host']
var whereFilter = lambda: ("prod" == 'true')
var period = 10s
var every = 30s
var name = 'System high load'
var idVar = name + '-{{.Group}}'
var message = 'System high load on  {{ index .Tags "host" }}'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'
var outputDB = 'chronograf'
var outputRP = 'autogen'
var outputMeasurement = 'alerts'
var triggerType = 'threshold'
var crit = 4

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |window()
        .period(period)
        .every(every)
        .align()
    |mean('load5')
        .as('value')

var trigger = data
    |alert()
        .crit(lambda: "value" >= crit)
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .email()
        .telegram()
        .chatId('-1001217384444')
        .parseMode('Markdown')

trigger
    |eval(lambda: float("value"))
        .as('value')
        .keep()
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(outputRP)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

trigger
    |httpOut('output')

I can’t see any corresponding records in the Kapacitor’s logs and the alert doesn’t sent to neither telegram or email.

0 Likes

#7

Hi @falanger ,

is your problem solved ?
best regards

0 Likes

#9

Hi! Yes, It is! That was very unexpected solution. For some reasons, influxdb wasn’t able to make requests to the kapacitor and, consequently, alerts wasn’t working. So if you have the same problem, try to check connectivity between influxdb and kapacitor (in both directions).

0 Likes