Manually Resetting an Alert Generated by Kapactior

Hi,

I have an alert defined in a TICK script. The script alerts when the number of failures are above a threshold.

This script works fine in alerting the failures. However I found that, the alert never becomes OK when no more failures are happening.

I have a query similar to one below in the alert script.

SELECT count(responsetime) FROM transactions_records WHERE responsecode = 123 AND $timeFilter

So when no failures are happening, this query will not return any data and the TICK script pipeline will not execute.

This leave the alert active for a long time.

To fix this I saved the alert details into InfluxDB. Details like alert level, alert id etc which can help be resetting the alert.

I wrote a new TICK script like below.

//template_id alerts_app_alert_reset

var period = 1d
var offset = 30m
var groupBy = [‘id_tag’, ‘env’, ‘qcinstance’, ‘host’, ‘alert_channel’, ‘alert_name’, ‘alert_type’]
var db = ‘telegraf’
var retention = ‘two_months’
var topic = ‘qcalerts_withoutok’

var alertmessage = ‘Alert Manually reset’

var data = batch
|query(‘SELECT last(level) AS last_level, merchant, dashboard, responsecode, txntype FROM ’ + db + ‘.’ + retention + ‘.activealerts WHERE alert_type='app'’ )
.period(period)
.every(1m)
.groupBy(groupBy)
.offset(offset)
|where(lambda: “last_level” != ‘OK’)
|alert()
.crit(lambda: “last_level” == ‘OK’)
.warn(lambda: “last_level” == ‘OK’)
.stateChangesOnly()
.message(alertmessage)
.topic(topic)
.id(’{{index .Tags “id_tag”}}')
.idTag(‘id_tag’)
.idField(‘id’)
.levelTag(‘level_tag’)
.levelField(‘level’)
.durationField(‘duration’)
|delete()
.field(‘count’)
.field(‘last_level’)
|influxDBOut()
.database(‘telegraf’)
.retentionPolicy(‘two_months’)
.measurement(‘activealerts’)

Now what I’m not sure is, if I use the original alerts id as the id of alert in this script, will it reset the original alert?

Even though, this TICK script is making the alert to OK level, when a new CRITICAL/WARNING alert is generated its duration is not 0.

It seems the alert reset by the above TICK script is treated a different alert by Kapacitor.

Can anyone help me to identify, what is the unique identifier that I should use across scripts to reset an alert? Is it the alert id ?

Thanks,
Robert

Is alert id unique across different tasks, or is it unique with in a task? Can somebody confirm this?

This is how alert ID’s are defined. You can use this templating to make it whatever you want.

1 Like

A bit late to the party…but…

I think you don’t get the OK alerts because you filter for that response code. So as soon as the response code changes to OK you’re script is no longer processing data.

Remove the response code from the where filter and update the crit to trigger on the required response code. Then it should trigger any alerts while in that state, as soon as they recover it should fire off the OK message.

1 Like