Kapacitor - Calculate error rate within same series

kapacitor

#1

Hello,

I have an influx measurement which will differentiate success vs failure using a tag(status=0 for success, status=non-zero for failure). I am trying to write a tick script to generate an alert when the error percentage exceeds a certain value.

To find the error percentage (100*error_count / total), I was able to individually calculate the total records and error count, for a 2minute window as below. How do I create an alert combining these two values? Or is there a better way to do this?

var data = stream
  |from()
    .measurement('MTP')
  |window()
    .period(2m)
    .every(1m)

var total =  data
  |count('status')

var error = data
  |where(lambda: "status" != '0')
  |count('status')
    
var alert = ??????
  |alert()
    .id('{{ .TaskName }}')
    .crit(lambda: 100 * "error"/ "total" > 0 )
    .message('Error value: {{index .Fields "error"}}')
    .log('/tmp/total.log')

Appreciate your help!


#2

@zamrbo That looks like the right way to do it. Is the above alert working as expected? If not I would suggest looking into the Join() node to join the total and error streams before doing the alert.


#3

@jackzampolin No, the alert is not working. I tried to use join as you suggested and found a similar example that (outer) joins two measurements to generate the alert.

Unfortunately, i have not been able to get my alert to work. Not sure where I am going wrong. My updated script using join:

var data = stream
  |from()
    .measurement('MTP')
  |window()
    .period(2m)
    .every(1m)

var total =  data
  |count('status')
    
var error = data
  |where(lambda: "status" != '0')
  |count('status')

error
  |join(total)
    .fill(0)
    .as('errors', 'totals')
  |eval(lambda: "errors.error" / "totals.total")
    .as('value')
  |alert()
    .id('{{ .TaskName }}')
    .crit(lambda: "value" > 0)
    .message('Value: {{index .Fields "value"}}')
    .log('/tmp/total.log')

#4

I think you might be missing the naming on errors.error and totals.total does the below work for you?

var data = stream
  |from()
    .measurement('MTP')
  |window()
    .period(2m)
    .every(1m)

var total =  data
  |count('status')

var error = data
  |where(lambda: "status" != '0')
  |count('status')

error
  |join(total)
    .fill(0)
    .as('errors', 'totals')
  |eval(lambda: "errors.count" / "totals.count")
    .as('value')
  |alert()
    .id('{{ .TaskName }}')
    .crit(lambda: "value" > 0)
    .message('Value: {{index .Fields "value"}}')
    .log('/tmp/total.log')

Eval() divison operator always returns zero
#5

You are right, access to the values are through errors.count and totals.count. Thank you @jackzampolin!

Looks like the value field was rounded down to 0. So I had to multiply with 100 for the alert expression(value>0) to be become TRUE. This triggers the alert.

error
 |join(total)
  .fill(0)
  .as('errors', 'totals')
 |eval(lambda: 100 * "errors.count" / "totals.count")
  .as('value')
 |alert()
  .id('{{ .TaskName }}')
  .crit(lambda: "value" > 0)
  .message('Value: {{index .Fields "value"}}')
  .log('/tmp/total.log')

In the total.log file, I still see value as int, is there a way to keep this a float?


#6

I have a hunch the following will work: |eval(lambda: 100.0 * "errors.count" / "totals.count")


#7

That(100.0 instead of 100) actually stopped the alert from triggering. Weird!


#8

Well in that case I have no idea! Sorry!


#9

Anyways, thanks @jackzampolin for solving the original issue. I am now able to generate an alert when the error rate exceeds a certain threshold :slight_smile:. I will try to setup the example in docs to better understand rounding off issue.


#10

Solved the number rounding issue by converting the fields to float. Final eval:

 |eval(lambda: 100.0 * float("errors.count") / float("totals.count"))
     .as('value')

#11

I’m trying to do something similar – alert when a field named error exceeds a certain value or when it exceeds a certain percentage. I’m using the aforementioned script as an example, but I’m getting a cannot get properties of non pointer value on the line that has the|alert() method.

What am I doing wrong? Here’s my script:

var data = stream
        |from()
                .measurement('api_detail')

var total = data
        |count('tr-error')

var error = data
        |where(lambda: "tr-error" == 'true')
        |count('tr-error')

error
        |join(total)
                .fill(0)
                .as('errors', 'totals')
        |eval(lambda: "errors.count" / "totals.count")
                as('value')

        |alert()
                .message('message is {{index .Fields "error"}}')
                .crit(lambda: TRUE)
                .log('/tmp/alert.log')
        |httpOut('top10')
        |log()

#12

Found the error of my ways. The as() method before the alert() does not have a period in front of it. Adding it makes the error go away.:blush: