Hiya,
I have a script at the moment that is working (to a degree) but I’m noticing something that seems strange, but at the same time is probably how it should be working.
My script counts the occurrence of the values 1 and 0 (1 being a fail, 0 being pass). It counts the failures 3 times and alerts me fine, it also used to alert me when it went back to the OK level.
I added a deadman switch to the script to alert when there is absolutely no throughput and now it seems that the deadman alert is firing when it should be the OK message.
I’m thinking the reason behind this is because the other value is 0 and the deadman see’s this value as ‘0.0 points’ of data.
So…my questions are as follows.
- am i correct in thinking that the value 0 is classed as no through put?
- how would you go about doing something similar?
The script really should alert when 3 fails has occurred, when it goes back to the OK state after the critical alert and if none of those happen within 15 minutes then send the deadman alert (hope that makes sense).
Has anybody done something like this? How did you achieve it?
Script:
var db = 'testing'
var rp = 'autogen'
var measurement = 'optTest'
var groupBy = ['Apptimer', 'host']
var whereFilter = lambda: ("Apptimer" == 'Result')
var name = 'DRS_PORTAL'
var idVar = name + ':{{.Group}}'
var message = ''
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'
var outputDB = 'testing'
var outputRP = 'autogen'
var outputMeasurement = 'optResults'
var triggerType = 'threshold'
var crit = 0
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
|eval(lambda: "value")
.as('value')
|stateCount(lambda: "value" > crit)
|stats(10s)
.align()
|derivative('emitted')
.unit(10s)
.nonNegative()
var trigger = data
|alert()
.warn(lambda: "emitted" <= 0.0)
.crit(lambda: "state_count" >= 3)
.stateChangesOnly()
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.email('phil.b@********.co.uk')
trigger
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag('alertName', name)
.tag('triggerType', triggerType)
trigger
|httpOut('output')
I had the same problem when using either the |deadman node or if i write the code manually. I’d hoped if i wrote it manually i could work around it but now i’m just a little confused.
Anyone who can point me in the right direction it would be much appreciated!
PhilB