Hello, I have very strange problem. I’ve been trying to figure out this for 2 days now. I am using latest components of TICK stack. Data collector is telegraf with inputs.http_response plugin.
For alerting using result_code field. Problem: then I edit tick script and save it, It works for one state transition only OK -> CRIT or CRIT -> OK. Then I simulate OK or CRIT state again, nothing happens, no errors, nothing. Then I edit script and click save, then missing alert comes in. Simulation is done by adding block via iptables to destination host. In that way i’am getting 0 for ok and 1,2,3,4… for crit.
I even tried to add +1 with eval, because I thought that state OK can’t be zero But it’s not true.
Here is my tickscript:
var whereFilter = lambda: isPresent(“result_code”)
var crit = 3
var period = 120s
var every = 60s
var id = ‘{{ index .Tags “host”}}/OServer’
var message = ‘{{.Level}}: OServer got response {{ index .Tags “result” }} / {{.TaskName}}’
var details = ‘{{.Level}}!
{{ if eq .Level “CRITICAL” }}Alert: No Response from site!
what I did (except rule modification etc.) I checked incoming data. Looks ok, no lost points, nothing. I tried to delete all other rules, I even tried to point to other location /var/lib/kapacitor/kapacitor.db, for kapacitor to create new db. Nothing helped.
Any thoughts? Anyone?
In other words, my alert works only for one state change. For example OK -> CRIT, but then it returns to OK, I receive no alert at all. If I edit and save that alert rule, then I receive OK state.
Hi, I think problem solved.
Problem here with tag “result”.
I used this because I wanted to display this tag as additional information in message I receive.
This tag changes regarding state of service. If check state is ok, tag “result” is success. If crit - failed.
That means I have 2 or more different tags, which changes when check state changes, for one actual checkpoint. In other words,
if check result is OK, I have tag named “result” == success
if check result is CRIT, i have tag named “result” == timeout
With group by clause, this means that I have 2 check points:
One with state CRIT and tag value FAILED
One with state OK and tag value SUCCESS
But as we know, we are checking ONLY ONE CHECK POINT!
Which means, that kapacitor thinks, that this is 2 different check endpoints with different tags and states.
Removing this tag from group by clause and from message fields, solved the problem.
So I have working alert rule now with one drawback, I can’t use that tag in message field.