Hey all, I’m writing a TickScript that acts on a series of points that can have exactly two outcomes.
Either the result is pass
or “not pass” (usually some variant of exit NUM
).
The script I have looks sort of like this:
// RP: autogen
// Monitor the result of updates
// WARNING if the result is anything other than pass
batch
|query('''SELECT * FROM "mydb"."autogen"."measurement"''')
.period(25h)
.every(24h)
.groupBy('host')
|alert()
.id('kapacitor/{{ .TaskName }}/{{ .Group }}')
.infoReset(lambda: TRUE)
.warn(lambda: "result" != 'pass')
.message(
'{{ index .Tags "host" }}' +
'{{ if eq .Level "OK" }} are updating again.' +
'{{ else }}' +
'are failing to update.' +
'{{ end }}'
)
.idField('id')
.levelField('level')
.messageField('description')
.stateChangesOnly()
@alertFilterAdapter()
@alertFilter()
The script does seem to sort of do its thing, but has a critical issue of never setting the Level back to OK.
If I feed influx these 4 points:
time host name result
---- ---- ---- ------
1544079584447374994 fakeS176 /usr/bin/yum update -y pass
1544079584447374994 fakeS177 /usr/bin/yum update -y exit 1
1544129084447375177 fakeS176 /usr/bin/yum update -y exit 1
1544129084447375177 fakeS177 /usr/bin/yum update -y pass
I would expect 1 warning, and 1 OK. Where all of the timestamps listed above are within the 25 hour period.
However what actually happens is that I get 2 warns and no OKs.
Could someone give some advice on how to move forward?