Actually this is very straight forward and can be achieved via the chronograf alerts UI
for creating stream alerts. I didn’t have to even write the TICK script from scratch which definetely impressed me. I didn’t know previously that relative alerts are supported out of the box.
So basically the following TICK script was generated by chronograf, which I later customized.
var db = 'api_requests'
var rp = 'autogen'
var measurement = 'requests_stats'
var groupBy = ['api_path']
var whereFilter = lambda: TRUE
var period = 10s
var every = 30s
var name = 'api_alerts'
var idVar = name + '-{{.Group}}'
var message = 'Some of the APIs are exhibiting 70% change in execution time over a 5 minute window. Please look into them at the earliest.'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'
var outputDB = 'chronograf'
var outputRP = 'autogen'
var outputMeasurement = 'alerts'
var triggerType = 'relative'
var details = 'Some of the APIs are exhibiting 70% change in execution time over a 5 minute window. Please look into them at the earliest.'
var shift = 5m
var crit = 70
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|window()
.period(period)
.every(every)
.align()
|mean('time_to_execute')
.as('value')
var past = data
|shift(shift)
var current = data
var trigger = past
|join(current)
.as('past', 'current')
|eval(lambda: abs(float("current.value" - "past.value")) / float("past.value") * 100.0)
.keep()
.as('value')
|alert()
.crit(lambda: "value" >= crit)
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.details(details)
.stateChangesOnly()
.slack()
.channel('#logs')
trigger
|httpOut('output')
|httpPost('https://example.com/api-alerts/')
So the above TICK script alerts when the difference in mean
execution times for an API is greater than x%(70% in this case) over a 5min window
. It then caches the results using httpOut('output')
Node for me to view the results by visiting the /kapacitor/v1/tasks/task_id/output
.
I also added httpPost
incase I wanted to hit an API to autoscale or want to do something with these alerts etc.