Deadman switch values

Hiya,

I have a script at the moment that is working (to a degree) but I’m noticing something that seems strange, but at the same time is probably how it should be working.

My script counts the occurrence of the values 1 and 0 (1 being a fail, 0 being pass). It counts the failures 3 times and alerts me fine, it also used to alert me when it went back to the OK level.

I added a deadman switch to the script to alert when there is absolutely no throughput and now it seems that the deadman alert is firing when it should be the OK message.

I’m thinking the reason behind this is because the other value is 0 and the deadman see’s this value as ‘0.0 points’ of data.

So…my questions are as follows.

  1. am i correct in thinking that the value 0 is classed as no through put?
  2. how would you go about doing something similar?

The script really should alert when 3 fails has occurred, when it goes back to the OK state after the critical alert and if none of those happen within 15 minutes then send the deadman alert (hope that makes sense).

Has anybody done something like this? How did you achieve it?

Script:

var db = 'testing'

var rp = 'autogen'

var measurement = 'optTest'

var groupBy = ['Apptimer', 'host']

var whereFilter = lambda: ("Apptimer" == 'Result')

var name = 'DRS_PORTAL'

var idVar = name + ':{{.Group}}'

var message = ''

var idTag = 'alertID'

var levelTag = 'level'

var messageField = 'message'

var durationField = 'duration'

var outputDB = 'testing'

var outputRP = 'autogen'

var outputMeasurement = 'optResults'

var triggerType = 'threshold'

var crit = 0

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
    |eval(lambda: "value")
        .as('value')

    |stateCount(lambda: "value" > crit)

    |stats(10s)
        .align()
    |derivative('emitted')
        .unit(10s)
        .nonNegative()

var trigger = data
    |alert()
        .warn(lambda: "emitted" <= 0.0)
        .crit(lambda: "state_count" >= 3)
        .stateChangesOnly()
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .email('phil.b@********.co.uk')

trigger
    |influxDBOut()
        .create()
        .database(outputDB)
        .retentionPolicy(outputRP)
        .measurement(outputMeasurement)
        .tag('alertName', name)
        .tag('triggerType', triggerType)

trigger
    |httpOut('output')

I had the same problem when using either the |deadman node or if i write the code manually. I’d hoped if i wrote it manually i could work around it but now i’m just a little confused.

Anyone who can point me in the right direction it would be much appreciated!

PhilB

No, the value of the points has no bearing on the “emitted” field for the stats node. In fact the only field on the data after a stats node is the “emitted” field, this is becaise the stats node track statistics about the flow of data and does not pass on the actual data. In your script you have one alert node that is trying to use both the “emitted” field and the “state_count” field. This will not work as the “state_count” field is no longer present after going through the stats node.

The fix is to use two alert nodes, one that triggers based off “state_count” and another based of the “emitted” value from the stats node.

Hi Nathaniel,

Thank you for your reply. I had set this up as two scripts initially but now it needs to be one. I’ll do as you suggest and split them into two alert nodes.

Thanks for your help,

PhilB