We use Influx/Kapacitor in a common server shared by a variety of different applications. Using hardcoded alert thresholds in the tick scripts (ie var crit = 80) means one-size-fits-all. We’d like to try to be able to customize some thresholds on a per-app basis by using tags - although if you have better suggestions, that would be OK too. Right now a sample tick looks like:
//parameters
var warn = 40
var unit = 30m
var crit = 20
var critHIGH = 10
var critLOW = 30
//Dataframe
var data = stream
|from()
.measurement(‘cpu’)
.groupBy(‘host’)
|default()
.tag(‘APP_CPU_CRITICAL_LEVEL’, ‘NORMAL’)
var alert = data
|alert()
.id(‘cpu-usage’)
.message(‘error message stuff’)
.info(lambda: “usage_idle” > warn)
.warn(lambda: “usage_idle” < warn)
.crit(lambda: “usage_idle” < crit AND ‘{{ index .Tags “APP_CPU_CRITICAL_LEVEL” }}’ == ‘NORMAL’ )
.crit(lambda: “usage_idle” < critHIGH AND ‘{{ index .Tags “APP_CPU_CRITICAL_LEVEL” }}’ == ‘HIGH’ )
.crit(lambda: “usage_idle” < critLOW AND ‘{{ index .Tags “APP_CPU_CRITICAL_LEVEL” }}’ == ‘LOW’ )
.stateChangesOnly(unit)
//alert
alert
.sensu()
.source(‘source stuff’)
The ‘default’ is working well, and I can see the tag in the log file. And its picking up any tags that I set in an applications telegraf.conf too.
But the AND ‘{{ index .Tags “APP_CPU_CRITICAL_LEVEL” }}’ == ‘xxxxxxx’ is not working. Removing all but the NORMAL one, with that added conditional on it, the alert does not trigger even when I can see measurements which should trigger it.
I’ve tried it without the {{ index . Tags … }} and that didn’t help either.
The syntax of the conditional seems OK from what I can tell, simply a string comparison of a TAG against different contents.
But obviously I’m doing something wrong. Any advice would be appreciated.
Matt