Manually Resetting an Alert Generated by Kapactior

Robert_George · September 21, 2020, 12:52pm

Hi,

I have an alert defined in a TICK script. The script alerts when the number of failures are above a threshold.

This script works fine in alerting the failures. However I found that, the alert never becomes OK when no more failures are happening.

I have a query similar to one below in the alert script.

SELECT count(responsetime) FROM transactions_records WHERE responsecode = 123 AND $timeFilter

So when no failures are happening, this query will not return any data and the TICK script pipeline will not execute.

This leave the alert active for a long time.

To fix this I saved the alert details into InfluxDB. Details like alert level, alert id etc which can help be resetting the alert.

I wrote a new TICK script like below.

//template_id alerts_app_alert_reset

var period = 1d
var offset = 30m
var groupBy = [‘id_tag’, ‘env’, ‘qcinstance’, ‘host’, ‘alert_channel’, ‘alert_name’, ‘alert_type’]
var db = ‘telegraf’
var retention = ‘two_months’
var topic = ‘qcalerts_withoutok’

var alertmessage = ‘Alert Manually reset’

var data = batch
|query(‘SELECT last(level) AS last_level, merchant, dashboard, responsecode, txntype FROM ’ + db + ‘.’ + retention + ‘.activealerts WHERE alert_type='app'’ )
.period(period)
.every(1m)
.groupBy(groupBy)
.offset(offset)
|where(lambda: “last_level” != ‘OK’)
|alert()
.crit(lambda: “last_level” == ‘OK’)
.warn(lambda: “last_level” == ‘OK’)
.stateChangesOnly()
.message(alertmessage)
.topic(topic)
.id(’{{index .Tags “id_tag”}}')
.idTag(‘id_tag’)
.idField(‘id’)
.levelTag(‘level_tag’)
.levelField(‘level’)
.durationField(‘duration’)
|delete()
.field(‘count’)
.field(‘last_level’)
|influxDBOut()
.database(‘telegraf’)
.retentionPolicy(‘two_months’)
.measurement(‘activealerts’)

Now what I’m not sure is, if I use the original alerts id as the id of alert in this script, will it reset the original alert?

Even though, this TICK script is making the alert to OK level, when a new CRITICAL/WARNING alert is generated its duration is not 0.

It seems the alert reset by the above TICK script is treated a different alert by Kapacitor.

Can anyone help me to identify, what is the unique identifier that I should use across scripts to reset an alert? Is it the alert id ?

Thanks,
Robert

Robert_George · September 22, 2020, 2:07am

Is alert id unique across different tasks, or is it unique with in a task? Can somebody confirm this?

Emrys_Landivar · November 2, 2020, 8:27pm

This is how alert ID’s are defined. You can use this templating to make it whatever you want.

philb · November 3, 2020, 9:06am

A bit late to the party…but…

I think you don’t get the OK alerts because you filter for that response code. So as soon as the response code changes to OK you’re script is no longer processing data.

Remove the response code from the where filter and update the crit to trigger on the required response code. Then it should trigger any alerts while in that state, as soon as they recover it should fire off the OK message.

Topic		Replies	Views
Kapacitor task performs a Alert Reset only once? Kapacitor	1	576	November 12, 2021
Alert not resetting when using count() Kapacitor	0	520	December 14, 2018
Tick script stops emitting (running) if no data present Kapacitor kapacitor	1	1663	January 20, 2018
State duration doesn't trigger an CritReset within Tick script Kapacitor influxdb , kapacitor	1	923	September 24, 2019
Kapacitor to alert when data stops for more than 36h and create a dashboard Kapacitor kapacitor	1	610	November 14, 2018

Manually Resetting an Alert Generated by Kapactior

Related topics