Hello, I’m configuring Kapacitor to send me alerts when one node reaches a certain percentage of CPU usage by 5 mins, but I don’t know how to send the alert to say the % used.
I’m ussing usage_idle to trigger the alert, but I want to send the (100 - idle) value in the alert, how can I do that? My message is this:
{{.Level}}: CPU usage in {{ index .Tags “host”}} es: {{ index .Fields “value” | printf “%.1f” }}
And I want something like:
{{.Level}}: CPU usage in {{ index .Tags “host”}} es: {{ ( 100 - index .Fields “value” ) | printf “%.1f” }}
I’ve tried everything I can think of but without luck.
Thanks in advance.
@Macfresno Can you include the entire tickscript that you’re using? Also, is it safe to assume that you’re using Telegraf as your data source?
@michael As I only need basic rules I’m using Chronograf’s Kapacitor rules whith this config for this rule:
Select:
SELECT mean("usage_idle") AS "mean_usage_idle" FROM "tbh"."autogen"."cpu" WHERE time > now() - 15m GROUP BY time(5m), "host"
Send Alert when usage_idle is Less Than 20 (80 % usage)
Alert message:
{{.Level}}: Uso CPU en {{ index .Tags "host"}} es: {{ index .Fields "value" | printf "%.1f" }}
You are right, I’m using Telegraf as my data source.
Do you have access to the Kapacitor instance where the task is running? If so, I have a couple of asks
- Can you run
kapacitor list tasks
- For each listed task run
kapacitor show <task id>
and paste the results back here.
ID Type Status Executing Databases and Retention Policies
chronograf-v1-098da2f8-41f8-4dcd-9b2d-2a9bfa0f0894 stream enabled true ["tbh"."autogen"]
chronograf-v1-35030398-bdc4-48ec-8bc7-fb83e9fc22ae stream enabled true ["tbh"."autogen"]
chronograf-v1-af1f2da6-9193-47f9-bd6b-93bcff9d176a stream enabled true ["tbh"."autogen"]
chronograf-v1-e6b9c39f-e6e0-49a7-993d-2970309db583 stream enabled true ["tbh"."autogen"]
And the task that I have problems with is this one:
ID: chronograf-v1-af1f2da6-9193-47f9-bd6b-93bcff9d176a
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 15 Mar 17 20:59 CET
Modified: 15 Mar 17 23:52 CET
LastEnabled: 15 Mar 17 23:52 CET
Databases Retention Policies: ["tbh"."autogen"]
TICKscript:
var db = 'tbh'
var rp = 'autogen'
var measurement = 'cpu'
var groupBy = ['host']
var whereFilter = lambda: TRUE
var period = 5m
var every = 30s
var name = 'CPU Usage'
var idVar = name + ':{{.Group}}'
var message = '{{.Level}}: Uso CPU en {{ index .Tags "host"}} es: {{ index .Fields "value" | printf "%.1f" }}'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'
var outputDB = 'chronograf'
var outputRP = 'autogen'
var outputMeasurement = 'alerts'
var triggerType = 'threshold'
var crit = 20
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|window()
.period(period)
.every(every)
.align()
|mean('usage_idle')
.as('value')
var trigger = data
|alert()
.crit(lambda: "value" < crit)
.stateChangesOnly()
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.telegram()
trigger
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag('alertName', name)
.tag('triggerType', triggerType)
trigger
|httpOut('output')
DOT:
digraph chronograf-v1-af1f2da6-9193-47f9-bd6b-93bcff9d176a {
graph [throughput="0.00 points/s"];
stream0 [avg_exec_time_ns="0s" ];
stream0 -> from1 [processed="180771"];
from1 [avg_exec_time_ns="13.035µs" ];
from1 -> window2 [processed="180771"];
window2 [avg_exec_time_ns="36.265µs" ];
window2 -> mean3 [processed="60254"];
mean3 [avg_exec_time_ns="514.622µs" ];
mean3 -> alert4 [processed="60254"];
alert4 [alerts_triggered="2" avg_exec_time_ns="41.064µs" crits_triggered="1" infos_triggered="0" oks_triggered="1" warns_triggered="0" ];
alert4 -> http_out6 [processed="2"];
alert4 -> influxdb_out5 [processed="2"];
http_out6 [avg_exec_time_ns="0s" ];
influxdb_out5 [avg_exec_time_ns="0s" points_written="2" write_errors="0" ];
}