Alerting if mean of a value is above a threshold value for a certain period of time

Actually this is very straight forward and can be achieved via the chronograf alerts UI for creating stream alerts. I didn’t have to even write the TICK script from scratch which definetely impressed me. I didn’t know previously that relative alerts are supported out of the box.

So basically the following TICK script was generated by chronograf, which I later customized.

var db = 'api_requests'

var rp = 'autogen'

var measurement = 'requests_stats'

var groupBy = ['api_path']

var whereFilter = lambda: TRUE

var period = 10s

var every = 30s

var name = 'api_alerts'

var idVar = name + '-{{.Group}}'

var message = 'Some of the APIs are exhibiting 70% change in execution time over a 5 minute window. Please look into them at the earliest.'

var idTag = 'alertID'

var levelTag = 'level'

var messageField = 'message'

var durationField = 'duration'

var outputDB = 'chronograf'

var outputRP = 'autogen'

var outputMeasurement = 'alerts'

var triggerType = 'relative'

var details = 'Some of the APIs are exhibiting 70% change in execution time over a 5 minute window. Please look into them at the earliest.'

var shift = 5m

var crit = 70

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |window()
        .period(period)
        .every(every)
        .align()
    |mean('time_to_execute')
        .as('value')

var past = data
    |shift(shift)

var current = data

var trigger = past
    |join(current)
        .as('past', 'current')
    |eval(lambda: abs(float("current.value" - "past.value")) / float("past.value") * 100.0)
        .keep()
        .as('value')
    |alert()
        .crit(lambda: "value" >= crit)
        .message(message)
        .id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .details(details)
        .stateChangesOnly()
        .slack()
        .channel('#logs')

trigger
    |httpOut('output')
    |httpPost('https://example.com/api-alerts/')

So the above TICK script alerts when the difference in mean execution times for an API is greater than x%(70% in this case) over a 5min window. It then caches the results using httpOut('output') Node for me to view the results by visiting the /kapacitor/v1/tasks/task_id/output.

I also added httpPost incase I wanted to hit an API to autoscale or want to do something with these alerts etc.

1 Like