Alert works only for one state transition. OK -> Crit or Crit -> OK

Hello, I have very strange problem. I’ve been trying to figure out this for 2 days now. I am using latest components of TICK stack. Data collector is telegraf with inputs.http_response plugin.
For alerting using result_code field. Problem: then I edit tick script and save it, It works for one state transition only OK -> CRIT or CRIT -> OK. Then I simulate OK or CRIT state again, nothing happens, no errors, nothing. Then I edit script and click save, then missing alert comes in. Simulation is done by adding block via iptables to destination host. In that way i’am getting 0 for ok and 1,2,3,4… for crit.
I even tried to add +1 with eval, because I thought that state OK can’t be zero But it’s not true.
Here is my tickscript:

var whereFilter = lambda: isPresent(“result_code”)

var crit = 3

var period = 120s

var every = 60s

var id = ‘{{ index .Tags “host”}}/OServer’

var message = ‘{{.Level}}: OServer got response {{ index .Tags “result” }} / {{.TaskName}}’

var details = ‘{{.Level}}!

{{ if eq .Level “CRITICAL” }}Alert: No Response from site!

{{ else }} Alert: Cleared!

{{ end }}
URL: {{ index .Tags “server”}}

Result: {{ index .Tags “result”}}

var data = stream
|from()
.database(‘telegraf’)
.retentionPolicy(‘autogen’)
.measurement(‘http_response’)
.where(whereFilter)
.groupBy(‘server’, ‘result’, ‘host’)
|eval(lambda: 1 + “result_code”)
.as(‘used’)
|window()
.period(period)
.every(every)
|mean(‘used’)
.as(‘value’)

var alert = data
|alert()
.stateChangesOnly()
.id(id)
.message(message)
.details(details)
.crit(lambda: “value” > crit)

alert
.email() …
Please help me :slight_smile: Thanks!

anyone have any ideas??

  • what I did (except rule modification etc.) I checked incoming data. Looks ok, no lost points, nothing. I tried to delete all other rules, I even tried to point to other location /var/lib/kapacitor/kapacitor.db, for kapacitor to create new db. Nothing helped.
    Any thoughts? Anyone?

Sorry but I didn’t understand the problem. Could you explain in other words?

ok > crit, crit > ok works but what’s not working? do you want an alert for crit > crit ? If so remove, .stateChangesOnly() function from alert node.

In other words, my alert works only for one state change. For example OK -> CRIT, but then it returns to OK, I receive no alert at all. If I edit and save that alert rule, then I receive OK state.

I setup my rule, save it. After that I simulate no response condition with firewall.

  1. Enabling block - I get CRIT state, everything looks ok.
  2. Disabling block - I get OK state, again, looks ok.
  3. Enabling block again - I get no state change, no alert, no errors. State remains same.
  4. If i edit and save rule, when I get CRIT state. (That should be happening at step 3)

Oh, I see now. This is weird. I would try with different tickscript on a different measure to learn if it’s not script related.

This script, mutated over 2 days a lot :slight_smile: But I will try to use different measurement. Thanks!

1 Like

Hi, I think problem solved.
Problem here with tag “result”.

I used this because I wanted to display this tag as additional information in message I receive.

This tag changes regarding state of service. If check state is ok, tag “result” is success. If crit - failed.
That means I have 2 or more different tags, which changes when check state changes, for one actual checkpoint. In other words,

if check result is OK, I have tag named “result” == success
if check result is CRIT, i have tag named “result” == timeout
With group by clause, this means that I have 2 check points:

One with state CRIT and tag value FAILED
One with state OK and tag value SUCCESS

But as we know, we are checking ONLY ONE CHECK POINT!

Which means, that kapacitor thinks, that this is 2 different check endpoints with different tags and states.
:boom:

Removing this tag from group by clause and from message fields, solved the problem. :slight_smile:
So I have working alert rule now with one drawback, I can’t use that tag in message field.

Or do I?

This topic was automatically closed 60 minutes after the last reply. New replies are no longer allowed.