Alerting if mean of a value is above a threshold value for a certain period of time

kapacitor

#1

Scenerio:

database: api_requests
measurement: requests_stats

So I have data being dumped continuously into the measurement requests_stats every time an API is called. I calculate the time taken by the API and dump that stat into influxdb.

Data structure:
host (tag)
account_id (tag)
user_id (tag)
api_path (tag)

execution_time (field)
network_type (field)
platform (field)

So now i want to alert when there is a difference in the execution time of an API by more than x% over a period of time y.

Example: API ‘A’ takes an average of 5seconds for a time interval of 1m. In the next 1m, it takes and average time 10seconds.

Is there a way on can achieve this in kapacitor rules??

I am new to kapacitor so not sure which node does this the best inspite of reading up the documentation.
Can this be done via StateDurationNode ??


How can I compare current value in my stream with previous value in tickscript?
#2

Hi ,

I will try this tomorrow or sunday so that I can confirm what I am about to say :slight_smile: ,

Yes you will need the statedurationnode,
The challenge is to find how to calculate the alert condition.
You can create 2 nodes with a different period using .offset() in one of them,
Then join these two nodes ( after applying the shift function to get the timestamp aligned ) and then use a lambda expression to calculate the %difference …
After that you can pass to the statedurationnode,
But as i said , i will have to try it myself …
Have a Nice weekend !


#3

Actually this is very straight forward and can be achieved via the chronograf alerts UI for creating stream alerts. I didn’t have to even write the TICK script from scratch which definetely impressed me. I didn’t know previously that relative alerts are supported out of the box.

So basically the following TICK script was generated by chronograf, which I later customized.

var db = 'api_requests'

var rp = 'autogen'

var measurement = 'requests_stats'

var groupBy = ['api_path']

var whereFilter = lambda: TRUE

var period = 10s

var every = 30s

var name = 'api_alerts'

var idVar = name + '-{{.Group}}'

var message = 'Some of the APIs are exhibiting 70% change in execution time over a 5 minute window. Please look into them at the earliest.'

var idTag = 'alertID'

var levelTag = 'level'

var messageField = 'message'

var durationField = 'duration'

var outputDB = 'chronograf'

var outputRP = 'autogen'

var outputMeasurement = 'alerts'

var triggerType = 'relative'

var details = 'Some of the APIs are exhibiting 70% change in execution time over a 5 minute window. Please look into them at the earliest.'

var shift = 5m

var crit = 70

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |window()
        .period(period)
        .every(every)
        .align()
    |mean('time_to_execute')
        .as('value')

var past = data
    |shift(shift)

var current = data

var trigger = past
    |join(current)
        .as('past', 'current')
    |eval(lambda: abs(float("current.value" - "past.value")) / float("past.value") * 100.0)
        .keep()
        .as('value')
    |alert()
        .crit(lambda: "value" >= crit)
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .details(details)
        .stateChangesOnly()
        .slack()
        .channel('#logs')

trigger
    |httpOut('output')
    |httpPost('https://example.com/api-alerts/')

So the above TICK script alerts when the difference in mean execution times for an API is greater than x%(70% in this case) over a 5min window. It then caches the results using httpOut('output') Node for me to view the results by visiting the /kapacitor/v1/tasks/task_id/output.

I also added httpPost incase I wanted to hit an API to autoscale or want to do something with these alerts etc.


#4

Thanks for sharing !!
I haven’t looked at chronograf because we use grafana ( not my decision :slight_smile: ) but I will have a look at chronograf as soon as possible !


#5

Chronograf is good. You should definetly try it out. A lot of advancements have been made to make things simpler. Grafana was way ahead of chronograf years back, but now chronograf is on par.

The things i liked in chronograf was:

  1. configurations can be created pretty easily now.
  2. ability to organise and create widgets inside dashboards.
  3. explore window to try out new queries and export it to dashboard from there.
  4. kapacitor alerting rules via UI instead of manually having to write the TICK script.
  5. security via OAuth 2.0.
  6. A dedicated log utility and ability to search logs by writing them to syslog measurement.