Calculate average of a single field with two tags

SWalter · August 6, 2019, 10:38am

Hi,

I wanted to calculate the average of a field value with respect to two tags server and name. server stores the IP address of a server and name corresponds to the ouput names of the ipmi sensor readings. We collect them through telegraf. So we know that we always get for each time point all values.

name could be psu1_pout or psu2_pout. So we want to calculate the average power draw of each server. I tried it with the following tick script, but failed. Right now I have no idea, why the stateDuration produce erros. It would be great to get some advice how to achieve the task within a tick script.

Since I was able to achieve the wished result within the data explorer in Chronograf, it should be possible. Nevertheless, I failed and have no idea how to continue, since the provided documentation is quite thin with respect to the creation of complex tick scripts. So if anybody have some useful links to more documentation or examples, this would be also very good.

Or if there are any further recommendations, please share your opinion.

Best Regards,

Stephan

And now the output from kapacitor with respect ot the alert.

root@kapacitor:/etc/kapacitor/templates# kapacitor show CMC_PSU_Off

ID: CMC_PSU_Off
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 05 Aug 19 16:35 CEST
Modified: 05 Aug 19 16:35 CEST
LastEnabled: 05 Aug 19 16:35 CEST
Databases Retention Policies: [“db_cmc”.“autogen”]
TICKscript:
var db = ‘db_cmc’
var rp = ‘autogen’
var measurement = ‘ipmi_sensor’
var groupBy = [‘server’, ‘name’]
var whereFilter = lambda: (“name” == ‘psu1_pout’ OR “name” == ‘psu2_pout’)
var name = ‘CMC PSU off’
var idVar = name + ‘-{{.Group}}’
var message = ’
ID {{.ID}}
Name {{.Name}}
TaskName {{.TaskName}}
Level {{.Level}}
GroupBy {{.Group}}
Tags {{.Tags}}
CMC {{ index .Tags “server” }}
Fault Chassi is off
Time {{.Time}}
’
var idTag = ‘alertID’
var levelTag = ‘level’
var messageField = ‘message’
var durationField = ‘duration’
var outputDB = ‘chronograf’
var outputRP = ‘autogen’
var outputMeasurement = ‘alerts’
var triggerType = ‘threshold’

var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|stateDuration(lambda: (mean(“value”) == 0))
.unit(1m)
.as(‘CritDuration’)

var trigger = data
|alert()
// state duration crit
.crit(lambda: (“CritDuration” > 5))
.stateChangesOnly()
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.log(’/etc/kapacitor/templates/alert_logs/cmc_psu_off.log’)

trigger
|eval(lambda: float(“value”))
.as(‘value’)
.keep()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag(‘alertName’, name)
.tag(‘triggerType’, triggerType)

trigger
|httpOut(‘output’)

DOT:
digraph CMC_PSU_Off {
graph [throughput=“0.00 points/s”];

stream0 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
stream0 -> from1 [processed=“123404010”];

from1 [avg_exec_time_ns=“22.172µs” errors=“0” working_cardinality=“0” ];
from1 -> state_duration2 [processed=“2490097”];

state_duration2 [avg_exec_time_ns=“48.357µs” errors=“2490097” working_cardinality=“1136” ];
state_duration2 -> alert3 [processed=“0”];

alert3 [alerts_inhibited=“0” alerts_triggered=“0” avg_exec_time_ns=“0s” crits_triggered=“0” errors=“0” infos_triggered=“0” oks_triggered=“0” warns_triggered=“0” working_cardinality=“0” ];
alert3 -> http_out6 [processed=“0”];
alert3 -> eval4 [processed=“0”];

http_out6 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];

eval4 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
eval4 -> influxdb_out5 [processed=“0”];

influxdb_out5 [avg_exec_time_ns=“0s” errors=“0” points_written=“0” working_cardinality=“0” write_errors=“0” ];
}

Anaisdg · August 6, 2019, 8:29pm

Hello @SWalter,

Thanks for your question. Yah Kapacitor is tricky like that. Do you mind first sharing the query that you produced in Chronograf (from when you say you “Since I was able to achieve the wished result within the data explorer in Chronograf”)

SWalter · August 7, 2019, 8:55am

Hi @Anaisdg

The following queuing creates the expected result. Only servers are shown within the graph, where “value” with the tags ‘psu1_pout’ and ‘psu2_pout’ are zero.

Select "mean_value" from (SELECT mean("value") AS "mean_value" FROM "db_cmc"."autogen"."ipmi_sensor" WHERE time > :dashboardTime: AND ("name"='psu1_pout' OR "name"='psu2_pout') GROUP BY time(:interval:), "server" FILL(null)) where "mean_value"=0 Group By "server"

PS: The forum is unusable with IE11 Version 11.0.960019377 since it load 500MB into for each single character of input and create high load so that the system hangs for several seconds. Chrome is fine

Anaisdg · August 7, 2019, 3:44pm

@SWalter why have you chosen to use Kapacitor for this? Have you tried using a Continuous Query? It might be easier.

SWalter · August 7, 2019, 7:19pm

We have to create alerts. So from my point of view, this should be done through Kapacitor. Or not?

SWalter · August 12, 2019, 9:06am

So no further input?

We have to use Kapacitor, since we also need to send a mail if we detect this events. With respect to my knowledge this is the purpose for that kapacitor is build.

I have looked into the Continuous Querry and it could be helpful at other points, but not for this specific problem.

Anaisdg · August 13, 2019, 7:57pm

@SWalter,
Sometimes it makes sense to use CQ instead of using Kapacitor if you’re only performing a few aggregations. You can then use Kapacitor on top to alert on the CQ. You’re right though, the alerting should be done through Kapacitor.

As for your script, I would suggest using a batch task instead, where the query is a subquery. Then do the additional where "mean_value"=0 Group By "server" in the rest of the tickscript.

Anaisdg · August 13, 2019, 8:06pm

@SWalter

    |from()
        .database('db_cmc')
        .retentionPolicy('autogen')
        .measurement('ipmi_sensor')
        .where(lambda: "name" == 'psu1_pout' OR "name" == 'psu2_pout')
        .groupBy('server')
    |window()
        .period(1m)
        .every(1m)
    |mean('value')
        .as('mean_value')```

SWalter · August 14, 2019, 8:35am

Ok, that seems quite simple. Is the trick, the usage of the window function, or that you just GroupBy server?

Right now I would say, that the window function would just eliminate the need of the stateDuration function. If we would increase the window to 5 minutes, since 0 means, the same like stateDuration of 5 minutes.

The drawback would be, that it wouldn’t be possible to define the critReset also with a duration. With respect to my knowledge.

The continues Queries are interesting, but it would make our setup more komplex, since we have different containers for all TICK components and right now you never have to to something within the Influxdb Container. With respect to my understanding this wouldn’t be anymore the case if we would use the CQ.

Thank you for your help. I will try to test it today

Topic		Replies	Views
Kapacitor: using other data from source measurement window Kapacitor	3	1117	June 8, 2017
Help wanted with tickscript Kapacitor	1	725	May 30, 2017
Kapacitor stream processing multiple fields Kapacitor kapacitor	10	4160	December 1, 2017
[Solved] Writing a TICKscript to aggregate multiple measurements into one Telegraf telegraf , kapacitor	5	12607	July 20, 2017
Kapacitor: Wildcards in Kapacitor / where() not filtering correctly / sum(), mean() sending "0" Kapacitor kapacitor	0	717	April 27, 2018

Calculate average of a single field with two tags

Related topics