Kapacitor false alarms on new EC2 instances

We are using netdata to feed metrics into influxdb and alerting via kapacitor.

Whenever we spin up a new EC2 instance, for example if an autoscaling group is scaling up, our TICK scripts trigger alerts like “CPU idle is low”. It is low during startup of an EC2 instance, so the alerts aren’t really wrong, but they’re not helpful, so we’d like to eliminate these alerts.

One option would be to delay the startup of netdata for some period (e.g. 5 minutes). That would make the alerts go away. But I would also lose visibility into these machines during a critical transition.

Kapacitor is not aware of the uptime of the various machines it is monitoring. All it knows is that the cpu_idle values tagged with host=“ip-172.32.4.29” is lower than the alert threshold.

Is there an elegant solution to this problem?

var threshold_warn = 13
var threshold_warn_reset = 20
var threshold_crit = 7
var threshold_crit_reset = 13

var metric_identifier = 'cpu_idle'
var metric_description = 'Idle CPU'
var metric_sense = '<'
var period = 120s
var every = 10s
var slack_handler = '/etc/kapacitor/scripts/slack.php'


var data = stream
  |from()
    .measurement('netdata.system.cpu.idle')
    .where(lambda: !strContains("instanceclass", 'wowza-'))
    .groupBy('host')
  |window()
    .period(period)
    .every(every)
  |mean('value')
    .as('stat')

data
  |alert()
    .id('{{ index .Tags "host"}}/' + string(metric_identifier))
    .stateChangesOnly()
    .message('{{ .Level }},{{ index .Tags "host"}},'
        + string(metric_identifier) + ','
        + string(metric_description) + ','
        + string(metric_sense) + ','
        + '{{ index .Fields "stat" }}' + ','
        + string(period) + ','
        + '{{ if eq .Level "CRITICAL" }}' + string(threshold_crit)
        + '{{ else }}' + string(threshold_warn)
        + '{{ end }}')
    .warn(lambda:      "stat" <= threshold_warn)
    .warnReset(lambda: "stat" >= threshold_warn_reset)
    .crit(lambda:      "stat" <= threshold_crit)
    .critReset(lambda: "stat" >= threshold_crit_reset)
    .exec(slack_handler)
    .log('/var/log/kapacitor/kapacitor.txt')

data
  |alert()
    .id('{{ index .Tags "host"}}/' + string(metric_identifier))
    .message('{{ .Level }} {{ .ID }}: {{ index .Fields "stat" }}')
    .stateChangesOnly()
    .crit(lambda:      "stat" <= threshold_crit)
    .critReset(lambda: "stat" >= threshold_crit_reset)
    .victorOps()
    .routingKey('urgent')
    .log('/var/log/kapacitor/kapacitor.txt')