Kapacitor - constant alerting every 20 seconds

kapacitor
#1

Hello Everyone!

I just built TICK setup and it seems pretty sweet! I was especially interested in the ability to set thresholds on specific metrics.

Via Chronograf, I setup a Kapacitor rule to alert on /var where used_percent > 90% (since I knew I had a machine that was almost full).

As soon as I enabled the script, it kept alerting every 20 seconds (the interval time I have set on telegraf).

Is this normal? Is there a way to only trigger when a threshold is met and then only trigger when the threshold is cleared?

#2

This is the script that Chronograf generated in the web UI:

var db = 'telegraf'
var rp = 'autogen'
var measurement = 'disk'
var groupBy = []
var whereFilter = lambda: ("path" == '/var')
var name = 'Filesystem /var (90%) III'
var idVar = name
var message = '{{ index .Tags "host"}} {{.Name}} {{.ID}} {{.Level}} {{ index .Fields "value" }}'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'
var outputDB = 'chronograf'
var outputRP = 'autogen'
var outputMeasurement = 'alerts'
var triggerType = 'threshold'
var crit = 90

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |eval(lambda: "used_percent")
        .as('value')

var trigger = data
    |alert()
        .crit(lambda: "value" > crit)
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .stateChangesOnly()

trigger
    |eval(lambda: float("value"))
        .as('value')
        .keep()
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(outputRP)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

trigger
    |httpOut('output')
#3

I also notice that is inconsistently alerting on the other nodes being “OK”

#4

I’d add a window node, then Kapacitor should stream the data for that amount of time before alerting you. You could also use either the |stateCount() or |stateDuration() nodes to restrict the alerts.
The first would only send an alert if the threshold was exceed X amount of times. The second would alert when in the state for X time.

Window node is simple enough, define two variables

var period = 10m
var every = 5m

then just before the eval node, add

|window()
.period(period)
.every(every)

State Count

State Duration

Window Node

#5

Awesome, thanks philb! I’ll play with this today and see how it goes.

I think your suggestion should be the default when making Kapacitor thresholds in Chronograf.
The initial TICK Script that Chronograf currently creates is unusable and might turn off people that are not willing to jump on the forums.

Thanks again for your suggestion!

#6

You’re welcome @aspitzer I just hope it helps!

I understand what you’re saying about Chronograf, the scripts are limited but they are a good baseplate when writing your own scripts.The Influx team are always improving it though.
Personally I’ll generate the script with Chronograf, go to the built in editor and copy + paste it into an IDE and define it with kapacitor.

I’d suggest deleting the generated script after you’ve copied it too, the generated ones get given a GUID type name. If you upload and define the script yourself you can name it something more memorable.