Alert Count not Updating

chronograf
kapacitor
#1

I have two test Kapacitor alerts running, both are sending traps. They were both configured from the command line. Chronograf is not showing the total number of alerts being sent in the UI. Is there a setting in the Kapacitor alert configuration to tell it to increment the alert counter?

#2

You might want to file an issue on this in GitHub. I’ve seen this as well when I enable a Kapacitor TICK script from outside of Chronograf. Ony the Chronograf-built TICK scripts seem to get counted.

dg

#3

Done…

1 Like
#4

Can you give me the output of kapacitor show <task name> for one of the tasks you created via the kapacitor CLI and one that you created using Chronograf.

#5

This one was created within Chronograf:

ID: chronograf-v1-b8c2ce5b-9999-4561-90a6-c85b5a0515c7
Error: 
Template: 
Type: stream
Status: disabled
Executing: false
Created: 29 Aug 17 21:31 UTC
Modified: 30 Aug 17 20:08 UTC
LastEnabled: 29 Aug 17 21:48 UTC
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
var db = 'telegraf'

var rp = 'autogen'

var measurement = 'system'

var groupBy = []

var whereFilter = lambda: TRUE

var name = 'Load Alert'

var idVar = name + ':{{.Group}}'

var message = 'Load Alert is {{.Level}} on {{index .Tags "host"}}'

var idTag = 'alertID'

var levelTag = 'level'

var messageField = 'message'

var durationField = 'duration'

var outputDB = 'chronograf'

var outputRP = 'autogen'

var outputMeasurement = 'alerts'

var triggerType = 'threshold'

var crit = 1

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |eval(lambda: "load1")
        .as('value')

var trigger = data
    |alert()
        .crit(lambda: "value" > crit)
        .stateChangesOnly()
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .email('dishmael@trace3.com')

trigger
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(outputRP)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

trigger
    |httpOut('output')

DOT:
digraph chronograf-v1-b8c2ce5b-9999-4561-90a6-c85b5a0515c7 {
stream0 -> from1;
from1 -> eval2;
eval2 -> alert3;
alert3 -> influxdb_out4;
alert3 -> http_out5;
}

And this one was created at the command line:

ID: ping_avg_resp_alert
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 05 Sep 17 22:13 UTC
Modified: 27 Sep 17 19:07 UTC
LastEnabled: 27 Sep 17 19:07 UTC
Databases Retention Policies: ["telegraf"."autogen"]
TICKscript:
stream
    // Select just the cpu measurement from our example database.
    |from()
        .measurement('ping')
    // Window
    // |window()
    //    .period(5s)
    //    .every(5s)
    // Send alerts
    |alert()
        // Alert criteria
        .crit(lambda: "average_response_ms" > 1.0)
        // Whenever we get an alert write it to a file.
        .log('/tmp/ping_alerts.log')
        // Send SNMP Trap
        .snmpTrap('1.3.6.1.4.1.1')
        .data('1.3.6.1.4.1.1.1', 's', '{{ .Level }}')
        .data('1.3.6.1.4.1.1.2', 's', '{{ index .Tags "url" }}')
        .data('1.3.6.1.4.1.1.3', 's', '{{ index .Fields "average_response_ms" }}')
        .data('1.3.6.1.4.1.1.4', 's', '1.0')

// .data('1.3.6.1.4.1.1.3', 's', '{{ index .Fields "maximum_response_ms" }}')
// .data('1.3.6.1.4.1.1.4', 's', '{{ index .Fields "minimum_response_ms" }}')
// .data('1.3.6.1.4.1.1.5', 's', '{{ index .Fields "packets_received" }}')
// .data('1.3.6.1.4.1.1.6', 's', '{{ index .Fields "packets_transmitted" }}')
// .data('1.3.6.1.4.1.1.7', 's', '{{ index .Fields "percent_packet_loss" }}')
// .data('1.3.6.1.4.1.1.8', 's', '{{ index .Fields "standard_deviation_ms" }}')


DOT:
digraph ping_avg_resp_alert {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="119191"];

from1 [avg_exec_time_ns="593ns" errors="0" working_cardinality="0" ];
from1 -> alert2 [processed="119191"];

alert2 [alerts_triggered="119191" avg_exec_time_ns="5.054043ms" crits_triggered="119188" errors="6" infos_triggered="0" oks_triggered="3" warns_triggered="0" working_cardinality="1" ];
}
#6

Try doing something like this

stream
    // Select just the cpu measurement from our example database.
    |from()
        .measurement('ping')
    // Window
    // |window()
    //    .period(5s)
    //    .every(5s)
    // Send alerts
    |alert()
        // Alert criteria
        .crit(lambda: "average_response_ms" > 1.0)
        // Whenever we get an alert write it to a file.
        .log('/tmp/ping_alerts.log')
        // Send SNMP Trap
        .snmpTrap('1.3.6.1.4.1.1')
        .data('1.3.6.1.4.1.1.1', 's', '{{ .Level }}')
        .data('1.3.6.1.4.1.1.2', 's', '{{ index .Tags "url" }}')
        .data('1.3.6.1.4.1.1.3', 's', '{{ index .Fields "average_response_ms" }}')
        .data('1.3.6.1.4.1.1.4', 's', '1.0')
    |influxDBOut()
        .create()
        .database('chronograf')
        .retentionPolicy('autogen')
        .measurement('alerts')
        .tag('alertName', 'my_cli_defined_alert')

By default, Kapacitor doesn’t write it’s alerts anywhere. Chronograf has access to those alerts by writing them back into InfluxDB using the influxDBOut node


trigger
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(outputRP)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

Let me know if that fixes the issue.

1 Like
#7

That worked perfectly, thanks!

1 Like