Target Group Healthy hosts tick script data

Hi Team,

we have an tick script to alert when healthy host count value changes, it was giving incorrect data to Kapacitor.
It was querying data through availability zone instead of region. i have attached tick script query we are using.

can we have some help on this as this is critical monitoring needed for our business requirement.

Part of tick script

var whereFilter = lambda: ( isPresent(“account”) AND “account” =~ /^prd$/ AND isPresent(“target_group”) AND isPresent(“availability_zone”) AND isPresent(“region”) AND isPresent(“load_balancer”) AND isPresent(“healthy_host_count_minimum”) AND isPresent(“healthy_host_count_average”) AND “load_balancer” =~ /^.XXX-XXX-./ AND "target_group" =~ /^.*XXX-prd.*/ )

var fieldToEvaluate = lambda: “healthy_host_count_minimum”

//var warnThreshold = 2
var warnThreshold = 6

var critThreshold = 0
//var critThreshold = 3

var period = 15m

//TODO:we can change this to 5m frequency
var every = 1m

var group_By = [‘account’, ‘region’, ‘load_balancer’, ‘target_group’ ]

var db = ‘telegraf’

var rp = ‘autogen’

var idVar = name + ‘: {{ index .Tags “target_group” }}’

var message = ‘{{.ID}} is {{.Level}}: {{.Name}} = {{ index .Fields “value” }}’

var idTag = ‘alertID’

var levelTag = ‘level’

var messageField = ‘message’

var durationField = ‘duration’

var outputDB = ‘chronograf’

var outputRP = ‘autogen’

var outputMeasurement = ‘alerts’

var triggerType = ‘threshold’

var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(group_By)
.where(whereFilter)
|window()
.align()
.period(period)
.every(every)
.fillPeriod()
|max(‘healthy_host_count_minimum’)
.as(‘stat’)

data
|alert()
.warn(lambda: “stat” < warnThreshold AND “stat” != critThreshold)
.stateChangesOnly()
.message(subject)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.details(details)
.email()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag(‘alertName’, name)
.tag(‘triggerType’, triggerType)

data
|alert()
.crit(prd_crit_threshold_lambda)
.stateChangesOnly()
.message(subject)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.details(details)
// .pagerDuty2()
.email()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag(‘alertName’, name)
.tag(‘triggerType’, triggerType)

can we have any references ?

Thanks,
karthik.

Hello @mgajjala,
Hmm I don’t know. Your TICK script looks okay to me. I’ll send this along to someone who knows the answer. I appreciate your patience.
Thanks!

Thanks you @Anaisdg

Below is data that was coming from Kapacitor, please see “value” it is showing 1 but correct value is 2.

Data is intermittent sometimes we are getting 2 and 1 vice-versa

  • Name = cloudwatch_aws_application_elb
  • Target Group = targetgroup/XXXXXX
  • Region = eu-central-1
  • value = 1
  • Time = 2020-10-10 21:01:00 +0000 UTC
  • Alert Duration = 0s
  • Level = WARNING
    Action = “Review the ALB hosts count and Auto scaling alarm triggers”

Hi @Anaisdg,

any resolution steps to try from my end ?

Ok, I am assuming that in influxdb avaliability_zone and region are both tags?
If this is true, then you need to do some sort of aggregation on the data by availability_zone. I suggest inserting a sum node before the max node to sum on ‘healthy_host_count_minimum’.

1 Like