Cannot get time based alerts with 0 value in tickscript

nated099 · July 23, 2019, 8:40pm

Kapacitor v. 1.5

Theres a long day and alot of code that has brought me to this point. I am trying to get alerts for a large series of metrics and have created a test tickscript to discover how this may be accomplished.
Goal: if a metric has a value of 0 for 5 minutes, send an alert for that metric. Edit: Also, if no points are sent for those 5 minutes send the same deadman alert. The following tickscript code provides this without the time, instead immediately reporting a value of 0 when it occurs.

Note: I have tried using deadman(threshold, time) method. This fails every time because for some reason it doesn’t report for a 0 value, it only reports when the metric no longer updates to influxDB.
I have also tried creating my own deadman with several combinations of stats() and derivative() with and without each other.
I have tried using chronograf generated deadman and threshold. Threshold doesn’t provide time, deadman doesn’t actually work for a attrValue of 0 which is being submitted as a field as a double and I have tried casting it as such into a deadman node to be safe.

My question, what needs to be added to the script below to discover the value being 0 for 5 minutes and trigger alert only then?

var db = ‘streamDB’

var rp = ‘retention’

var measurement = ‘measurement’

var groupBy = [‘host’, ‘topic’]

var whereFilter = lambda: (“topic” == ‘test_test’)

var name = ‘stest’

var idVar = name + ‘-{{.Group}}’

var message = '{{.ID}} {{.Level}} @ {{.Time}} for {{.Tags}} {{.Fields}} ’

var idTag = ‘alertID’

var levelTag = ‘level’

var messageField = ‘message’

var durationField = ‘duration’

var outputDB = ‘chronograf’

var outputRP = ‘autogen’

var outputMeasurement = ‘alerts’

var triggerType = ‘threshold’

var crit = 0

var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|eval(lambda: “attrValue”)
.as(‘value’)

var trigger = data
|alert()
.crit(lambda: “value” <= crit)
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.stateChangesOnly()
.slack()
.channel(‘#channel’)

trigger
|eval(lambda: float(“value”))
.as(‘value’)
.keep()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag(‘alertName’, name)
.tag(‘triggerType’, triggerType)

trigger
|httpOut(‘output’)

Showing the task:

DOT:
digraph chronograf-v1-0adab7bf-a05f-483b-8936-9115c5e4ca61 {
stream0 → from1;
from1 → eval2;
eval2 → http_out3;
http_out3 → derivative4;
derivative4 → http_out5;
http_out5 → alert6;
alert6 → eval7;
alert6 → http_out9;
eval7 → influxdb_out8;

philb · July 24, 2019, 3:10pm

Hi @nated099

Have you tried using the stateDuration() node?

Slightly modified version of the script on that web page, but essentially it’s saying count the duration that my value is less that 1. If it is less than 1 after 1 minute then alert (WARNING ALERT) or, if it is less than 1 for 5 minutes then alert (CRITICAL ALERT)

If you just want to alert after 5 minutes you can remove the line for the warning in the alert node.

stream
  |from()
    .measurement('measurement')
  |where(lambda: "tag" == 'tag-value')
  |groupBy('host')
  |stateDuration(lambda: "your_field" < 1)
    .unit(1m)
  |alert()
    // Warn after 1 minute
    .warn(lambda: "state_duration" >= 1)
    // Critical after 5 minutes
    .crit(lambda: "state_duration" >= 5)

nated099 · July 24, 2019, 3:34pm

I should probably modify my goal to specify. Thank you very much for your response @philb as this does work for a 0 value! However, it seems that stateDuration doesn’t account for points not being published (quoted from InfluxData docs) “If no data is sent, the StateDurationNode cannot evaluate the state and cannot calculate a duration.” Unfortunately I require an alert to be sent for 5 minutes of either 0 value or no points being sent. Is there a way around this with your method?

Perhaps further explanation is due to resolve this. Following is another example I have attempted to achieve a deadman alert. Unfortunately this is not receiving the sum and simply triggers every frequency (1m) without including fields in the message. If I remove the eval() node it works, but only based on the count(“attr_value”) and not the sum. Therefore, it causes a deadman only to trigger when no points are being published to influxDB, achieving half of the goal.

> var data = stream

    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .where(lambda: "tag" == 'tag-value')
        .groupBy('host', 'topic')
    |eval(lambda: sum("attr_value"))
        .as('value')
        
var trigger = data
     |deadman(0.0, 1m) 
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .stateChangesOnly()
        .slack()
        .channel('#alertchannel')

philb · July 24, 2019, 4:25pm

Hi @nated099

I think This should be possible. However I’m travelling at the minute so no access to my laptop. I’ll be back in the morning which should help.

nated099 · July 24, 2019, 4:26pm

@philb no worries. I combined your solution with a deadman to provide two separate categories of alert within one tickscript. This works very well thus far, thank you.

philb · July 24, 2019, 4:31pm

That is what I was to thinking. It’s possible to have more than one alert node in a script so you could include both in the one script. I wasn’t sure if you would need To write The deadman part instead of using The built in function. I have a script similar which monitors disk space but also has a deadman alert in there.

Sounds like you’re on the right track though

Topic		Replies	Views
Process down for X minutes - HOW?! Kapacitor telegraf , kapacitor	4	946	August 19, 2018
Alerting with eval & stateDuration doesn't produce alert Kapacitor	1	446	October 18, 2019
Period and Every in DeadMan Alert Kapacitor	1	553	May 30, 2019
Generic Deadman Alerts On Sparse Events Kapacitor	1	862	November 15, 2018
Deadman switch values Kapacitor kapacitor	2	1020	November 29, 2017

Cannot get time based alerts with 0 value in tickscript

Showing the task:

Related topics