Alert works only for one state transition. OK -> Crit or Crit -> OK

Naumis1 · May 20, 2020, 6:07pm

Hello, I have very strange problem. I’ve been trying to figure out this for 2 days now. I am using latest components of TICK stack. Data collector is telegraf with inputs.http_response plugin.
For alerting using result_code field. Problem: then I edit tick script and save it, It works for one state transition only OK -> CRIT or CRIT -> OK. Then I simulate OK or CRIT state again, nothing happens, no errors, nothing. Then I edit script and click save, then missing alert comes in. Simulation is done by adding block via iptables to destination host. In that way i’am getting 0 for ok and 1,2,3,4… for crit.
I even tried to add +1 with eval, because I thought that state OK can’t be zero But it’s not true.
Here is my tickscript:

var whereFilter = lambda: isPresent(“result_code”)

var crit = 3

var period = 120s

var every = 60s

var id = ‘{{ index .Tags “host”}}/OServer’

var message = ‘{{.Level}}: OServer got response {{ index .Tags “result” }} / {{.TaskName}}’

var details = ‘{{.Level}}!

{{ if eq .Level “CRITICAL” }}Alert: No Response from site!

{{ else }} Alert: Cleared!

{{ end }}
URL: {{ index .Tags “server”}}

Result: {{ index .Tags “result”}}
’

var data = stream
|from()
.database(‘telegraf’)
.retentionPolicy(‘autogen’)
.measurement(‘http_response’)
.where(whereFilter)
.groupBy(‘server’, ‘result’, ‘host’)
|eval(lambda: 1 + “result_code”)
.as(‘used’)
|window()
.period(period)
.every(every)
|mean(‘used’)
.as(‘value’)

var alert = data
|alert()
.stateChangesOnly()
.id(id)
.message(message)
.details(details)
.crit(lambda: “value” > crit)

alert
.email() …
Please help me Thanks!

Naumis1 · May 21, 2020, 5:48am

anyone have any ideas??

Naumis1 · May 21, 2020, 9:52am

what I did (except rule modification etc.) I checked incoming data. Looks ok, no lost points, nothing. I tried to delete all other rules, I even tried to point to other location /var/lib/kapacitor/kapacitor.db, for kapacitor to create new db. Nothing helped.
Any thoughts? Anyone?

Mert · May 21, 2020, 11:15am

Sorry but I didn’t understand the problem. Could you explain in other words?

ok > crit, crit > ok works but what’s not working? do you want an alert for crit > crit ? If so remove, .stateChangesOnly() function from alert node.

Naumis1 · May 21, 2020, 11:31am

In other words, my alert works only for one state change. For example OK -> CRIT, but then it returns to OK, I receive no alert at all. If I edit and save that alert rule, then I receive OK state.

Naumis1 · May 21, 2020, 11:42am

I setup my rule, save it. After that I simulate no response condition with firewall.

Enabling block - I get CRIT state, everything looks ok.
Disabling block - I get OK state, again, looks ok.
Enabling block again - I get no state change, no alert, no errors. State remains same.
If i edit and save rule, when I get CRIT state. (That should be happening at step 3)

Mert · May 21, 2020, 1:57pm

Oh, I see now. This is weird. I would try with different tickscript on a different measure to learn if it’s not script related.

Naumis1 · May 21, 2020, 2:09pm

This script, mutated over 2 days a lot But I will try to use different measurement. Thanks!

Naumis1 · May 22, 2020, 6:35am

Hi, I think problem solved.
Problem here with tag “result”.

I used this because I wanted to display this tag as additional information in message I receive.

This tag changes regarding state of service. If check state is ok, tag “result” is success. If crit - failed.
That means I have 2 or more different tags, which changes when check state changes, for one actual checkpoint. In other words,

if check result is OK, I have tag named “result” == success
if check result is CRIT, i have tag named “result” == timeout
With group by clause, this means that I have 2 check points:

One with state CRIT and tag value FAILED
One with state OK and tag value SUCCESS

But as we know, we are checking ONLY ONE CHECK POINT!

Which means, that kapacitor thinks, that this is 2 different check endpoints with different tags and states.

Removing this tag from group by clause and from message fields, solved the problem.
So I have working alert rule now with one drawback, I can’t use that tag in message field.

Or do I?

system · May 25, 2020, 6:43am

This topic was automatically closed 60 minutes after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kapacitor - Send OK alerts when info, warn or crit are not triggered Kapacitor	3	2248	August 21, 2018
StateCount for sending an OK alert kapacitor	0	631	August 2, 2018
Notification when status changes from OK -> CRIT Kapacitor	17	2379	October 9, 2020
Alert State Change Threshold kapacitor	1	447	February 15, 2020
State not changing to OK for all hosts Kapacitor	2	698	May 11, 2018

Alert works only for one state transition. OK -> Crit or Crit -> OK

Related topics