Hi,
I wanted to calculate the average of a field value
with respect to two tags server
and name
. server
stores the IP address of a server and name
corresponds to the ouput names of the ipmi sensor readings. We collect them through telegraf. So we know that we always get for each time point all values.
name
could be psu1_pout or psu2_pout. So we want to calculate the average power draw of each server. I tried it with the following tick script, but failed. Right now I have no idea, why the stateDuration
produce erros. It would be great to get some advice how to achieve the task within a tick script.
Since I was able to achieve the wished result within the data explorer in Chronograf, it should be possible. Nevertheless, I failed and have no idea how to continue, since the provided documentation is quite thin with respect to the creation of complex tick scripts. So if anybody have some useful links to more documentation or examples, this would be also very good.
Or if there are any further recommendations, please share your opinion.
Best Regards,
Stephan
And now the output from kapacitor with respect ot the alert.
root@kapacitor:/etc/kapacitor/templates# kapacitor show CMC_PSU_Off
ID: CMC_PSU_Off
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 05 Aug 19 16:35 CEST
Modified: 05 Aug 19 16:35 CEST
LastEnabled: 05 Aug 19 16:35 CEST
Databases Retention Policies: [“db_cmc”.“autogen”]
TICKscript:
var db = ‘db_cmc’
var rp = ‘autogen’
var measurement = ‘ipmi_sensor’
var groupBy = [‘server’, ‘name’]
var whereFilter = lambda: (“name” == ‘psu1_pout’ OR “name” == ‘psu2_pout’)
var name = ‘CMC PSU off’
var idVar = name + ‘-{{.Group}}’
var message = ’
ID {{.ID}}
Name {{.Name}}
TaskName {{.TaskName}}
Level {{.Level}}
GroupBy {{.Group}}
Tags {{.Tags}}
CMC {{ index .Tags “server” }}
Fault Chassi is off
Time {{.Time}}
’
var idTag = ‘alertID’
var levelTag = ‘level’
var messageField = ‘message’
var durationField = ‘duration’
var outputDB = ‘chronograf’
var outputRP = ‘autogen’
var outputMeasurement = ‘alerts’
var triggerType = ‘threshold’
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|stateDuration(lambda: (mean(“value”) == 0))
.unit(1m)
.as(‘CritDuration’)
var trigger = data
|alert()
// state duration crit
.crit(lambda: (“CritDuration” > 5))
.stateChangesOnly()
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.log(’/etc/kapacitor/templates/alert_logs/cmc_psu_off.log’)
trigger
|eval(lambda: float(“value”))
.as(‘value’)
.keep()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag(‘alertName’, name)
.tag(‘triggerType’, triggerType)
trigger
|httpOut(‘output’)
DOT:
digraph CMC_PSU_Off {
graph [throughput=“0.00 points/s”];
stream0 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
stream0 -> from1 [processed=“123404010”];
from1 [avg_exec_time_ns=“22.172µs” errors=“0” working_cardinality=“0” ];
from1 -> state_duration2 [processed=“2490097”];
state_duration2 [avg_exec_time_ns=“48.357µs” errors=“2490097” working_cardinality=“1136” ];
state_duration2 -> alert3 [processed=“0”];
alert3 [alerts_inhibited=“0” alerts_triggered=“0” avg_exec_time_ns=“0s” crits_triggered=“0” errors=“0” infos_triggered=“0” oks_triggered=“0” warns_triggered=“0” working_cardinality=“0” ];
alert3 -> http_out6 [processed=“0”];
alert3 -> eval4 [processed=“0”];
http_out6 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
eval4 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
eval4 -> influxdb_out5 [processed=“0”];
influxdb_out5 [avg_exec_time_ns=“0s” errors=“0” points_written=“0” working_cardinality=“0” write_errors=“0” ];
}