Kapacitor did not sent alert even after thresholds crossed

Amit_Kumar_Singh · May 22, 2019, 12:36pm

I have a RHEL5 server where telegraf is installed and it sends data to a influx server every 5 minute.
In kapactior we check for cpu with every=6m and period=7m.
and it did not sent the alert when thresholds crossed.
I also use the influx query from the kapacitor log for that time and output gave the cpu utilization crossed the thresholds.
and it did send alert for that server before and it is sending all the alerts afterwards.
It just missed only one alert.

Kapacitor version is 1.5.1

katy · May 22, 2019, 3:57pm

Amit, which alert handler are you using? Have you run the test connection to make sure the alert is configured?

Amit_Kumar_Singh · May 22, 2019, 4:15pm

I am using email and it is working as i got mails before the incident and after the incident.
I also got alert at same time for other metrics for different server. Only one alert was not triggered

katy · May 22, 2019, 4:29pm

Can you send the TICKscript or a screenshot?

Amit_Kumar_Singh · May 22, 2019, 4:45pm

Please find the output of kapacitor show

ID: cct_cpu_alert
Error: 
Template: 
Type: batch
Status: enabled
Executing: true
Created: 27 Nov 18 00:03 IST
Modified: 22 Feb 19 11:58 IST
LastEnabled: 22 Feb 19 11:58 IST
Databases Retention Policies: ["cct"."autogen"]
TICKscript:
// database
var database = 'cct'

// measurement from where data is coming
var measurement = 'cpu'

// RP from where data is coming
var RP = 'autogen'

// which influx cluster to use
var clus = 'application'

// durations
var period = 7m

var every = 6m

// alerts
var warn = 20

var crit = 10

var alertName = 'cct_cpu_alert'

var triggerType = 'threshold'

batch
    |query('''SELECT last("usage_idle") as "value" FROM "''' + string(database) + '''"."''' + string(RP) + '''"."''' + string(measurement) + '''" WHERE cpu = 'cpu-total' ''')
        .cluster(clus)
        .period(period)
        .every(every)
        .groupBy(*)
        .align()
    |alert()
        .warn(lambda: "value" < warn)
        .crit(lambda: "value" < crit)
        .stateChangesOnly()
        .message('{{.Level}} CPU Idle as on {{ .Time.Local.Format "2006.01.02 - 15:04:05" }} is {{ index .Fields "value" | printf "%0.2f" }}% in {{ index .Tags "host" }} ')
        .details('''

 <pre>
 ------------------------------------------------------------------
 CLIENT NAME      : XXXXXX
 ENVIRONMENT      : Prod
 DEVICE TYPE      : {{ index .Tags "os" }}
 APPLICATION NAME : {{ index .Tags "app_stack" }}
 HOST NAME        : {{ index .Tags "host" }}
 IP ADDRESS       : {{ index .Tags "ip" }}
 DATE             : {{ .Time.Local.Format "2006.01.02 - 15:04:05" }}
 ITEM NAME        : CPU Idle (%)
 VALUE            : {{ index .Fields "value" | printf "%0.2f" }} %
 SEVERITY         : {{.Level}}
 ------------------------------------------------------------------
 </pre>
	
''')
        .log('/tmp/chronograf/cct_cpu_alert.log')
        .levelTag('level')
        .idTag('id')
        .messageField('message')
        .email()
        .to('amit.singh2@xyz.com')
    |influxDBOut()
        .database('chronograf')
        .retentionPolicy(RP)
        .measurement('alerts')
        .tag('alertName', alertName)

DOT:
digraph cct_cpu_alert {
graph [throughput="0.00 batches/s"];

query1 [avg_exec_time_ns="66.766732ms" batches_queried="145932" errors="0" points_queried="145932" working_cardinality="0" ];
query1 -> alert2 [processed="145932"];

alert2 [alerts_inhibited="0" alerts_triggered="71" avg_exec_time_ns="52.304µs" crits_triggered="6" errors="0" infos_triggered="0" oks_triggered="32" warns_triggered="33" working_cardinality="271" ];
alert2 -> influxdb_out3 [processed="71"];

influxdb_out3 [avg_exec_time_ns="14.009µs" errors="0" points_written="71" working_cardinality="0" write_errors="0" ];
}

katy · May 22, 2019, 5:55pm

Can you verify that the data crossed the threshold? If you’re using Chronograf, you should be able to use the Alert Builder to verify.

Amit_Kumar_Singh · May 22, 2019, 9:59pm

and we can see the query was executed at same time and and gave the similar output.

Topic		Replies	Views
Kapacitor alerting TICK script is not working as it should Kapacitor kapacitor	1	2386	April 3, 2017
Need Help on Creating Alerts Kapacitor kapacitor , chronograf , grafana	9	1110	December 18, 2019
Kapacitor not raising alerts correctly kapacitor , chronograf	1	124	May 14, 2024
Kapacitor - One rule to alert for multiple hosts CPU usage? Kapacitor kapacitor	1	1862	November 9, 2017
Kapacitor - constant alerting every 20 seconds kapacitor	5	1124	October 26, 2018

Kapacitor did not sent alert even after thresholds crossed

Related topics