Kapacitor Join is not working with Group clause

#1

I am pretty new to the Kapacitor, I am trying to get the failure rate of the individual application, so in same measurement for the individual application I have failure and success message so i am trying to compute the percentage of the failure for the individual application. I am using the below tick script: But as you can see below Join is not processing any results.

Any help is much appreciated.

var fail = stream
|from()
.measurement(‘apm_event’)
.where(lambda: “component_category” == ‘HTTP_FILTER’ AND “status” == ‘FAIL’)
|window()
.period(5m)
.every(10s)
|groupBy(‘appl_id’)
|count(‘elapse_time’)
.as(‘elapse_time’)

var total = stream
|from()
.measurement(‘apm_event’)
.where(lambda: “component_category” == ‘HTTP_FILTER’)
|window()
.period(5m)
.every(10s)
|groupBy(‘appl_id’)
|count(‘elapse_time’)
.as(‘elapse_time’)

fail
|join(total)
.as(‘fails’, ‘totals’)
.on(‘appl_id’)
// |eval(lambda: 100.0 * float(“fails.elapse_time”) / float(“totals.elapse_time”))
// .as(‘value’)
|alert()
.crit(lambda: “value” > 0)
.log(’/tmp/alerts4.log’)

DOT:
digraph service_failure_alerts {
graph [throughput=“48.00 points/s”];

stream0 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
stream0 -> from5 [processed=“4190”];
stream0 -> from1 [processed=“4190”];

from5 [avg_exec_time_ns=“2.86µs” errors=“0” working_cardinality=“0” ];
from5 -> window6 [processed=“2884”];

window6 [avg_exec_time_ns=“2.577µs” errors=“0” working_cardinality=“1” ];
window6 -> groupby7 [processed=“4”];

groupby7 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“46” ];
groupby7 -> count8 [processed=“102”];

count8 [avg_exec_time_ns=“255.908µs” errors=“0” working_cardinality=“0” ];
count8 -> join10 [processed=“102”];

from1 [avg_exec_time_ns=“4.991µs” errors=“0” working_cardinality=“0” ];
from1 -> window2 [processed=“42”];

window2 [avg_exec_time_ns=“2.4µs” errors=“0” working_cardinality=“1” ];
window2 -> groupby3 [processed=“3”];

groupby3 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“3” ];
groupby3 -> count4 [processed=“3”];

count4 [avg_exec_time_ns=“0s” errors=“0” working_cardinality=“0” ];
count4 -> join10 [processed=“3”];

join10 [avg_exec_time_ns=“11.406µs” errors=“0” working_cardinality=“0” ];
join10 -> alert11 [processed=“0”];

alert11 [alerts_triggered=“0” avg_exec_time_ns=“0s” crits_triggered=“0” errors=“0” infos_triggered=“0” oks_triggered=“0” warns_triggered=“0” working_cardinality=“0” ];

TICKscript: Calculate difference between two streams but join is failing
#2

I think I might have a similar problem here:

The .groupBy(*) clause is causing the join() to fail and emit no items. If I remove the group (which I can do in this case) the join works as expected.

#3

Hi,
I’ve been facing with exactly this situation…try to calculate the error percent based on the same metri, similar to the script uploaded by @Gaurav_Gupta.
My first problem was with the by 0 division. I had to rewrite my script in order to be sure that the “totals.elapse_time” in this case, is never 0. If so, kapacitor behaves really strange, with no info on the logs, and even completly crashing the kapacitor itself!
Considering this, that maybe not be your case, i also added align and alignGroup on the batch (i go to batch after many problemas with streams…) and tolerance on the join…The align ones will make the the points match the begining and the end edge and also the starting (i guess in stream would be similar to round / truncate). The tolerance, up to the every time, would match points on that range.

Howewer in my case, now in batch, i have time problems…I’m detecting strange behaviour on the ts of the query vs ts of the event (in some cases, with quite a lot of delay) between them. Here is where i’m fighting now…