Help with get the TOP of a previous aggregated values

kapacitor
#1

Hi,

I have the below code:

var total_hits =
stream
	|from()
		.database('stats_prv')
		.retentionPolicy('xtgpolicy')
		.measurement('hotel_avail')
		.groupBy('hus', 'hpr')
	|window()
		.period(5m)
		.every(1m)
	|sum('hit')
		.as('hits')

It returns the sum of “hit” by “hus” and “hpr”. Now I want to get the TOP(10) of “hus” and “hpr” depending on “hits”.

The problem is that if I do a TOP(10, hits) after the SUM(), it returns the TOP(10) of each “hus” and “hpr”, but it isn’t what I want. I want only 10 rows, the 10 pairs of “hus” and “hpr” with more “hits”.

How can I get what I want?

Thanks and regards,
Marcos.

#2

Try this:

var total_hits =
stream
	|from()
		.database('stats_prv')
		.retentionPolicy('xtgpolicy')
		.measurement('hotel_avail')
		.groupBy('hus', 'hpr')
	|window()
		.period(5m)
		.every(1m)
	|sum('hit')
		.as('hits')
    // Change groupBY to be nothing.
    |groupBy()
    |top(10, 'hits')
#3

Hi @nathaniel,

Sorry but I think that I’m not quite understanding how Kapacitor works.

I’ve tried your recommendation and the results are:

ID: TOP12_timeout
Error: 
Template: 
Type: stream
Status: enabled
Executing: true
Created: 24 Apr 17 13:51 UTC
Modified: 24 Apr 17 13:51 UTC
LastEnabled: 24 Apr 17 13:51 UTC
Databases Retention Policies: ["stats_prv"."xtgpolicy"]
TICKscript:
var timeout_errors = stream
    |from()
        .database('stats_prv')
        .retentionPolicy('xtgpolicy')
        .measurement('hotel_avail')
        .where(lambda: "et" == 'Communication_error')
        .groupBy('hus', 'hpr')
    |window()
        .period(5m)
        .every(1m)
    |sum('hit')
        .as('hits')
    |groupBy()
    |top(1, 'hits')
    |alert()
        .slack()
        .channel('#ntf-tgx')
        .iconEmoji(':scream:')
        .username('Kapacitor Alert - TIMEOUT')
        .message('TOP12 {{ .Level }} High timeout rate for {{index .Tags "hus"}} - {{index .Tags "hpr"}}. Timeout rate: {{ index .Fields "rate"}}%. Total hits: {{ index .Fields "hits"}}. Timeout hits {{ index .Fields "fail"}}')
        .crit(lambda: "hits" >= 1)
        .noRecoveries()
        .stateChangesOnly()

DOT:
digraph TOP12_timeout {
graph [throughput="0.00 points/s"];

stream0 [avg_exec_time_ns="0s" ];
stream0 -> from1 [processed="844876"];

from1 [avg_exec_time_ns="13.88µs" ];
from1 -> window2 [processed="30296"];

window2 [avg_exec_time_ns="2.421µs" ];
window2 -> sum3 [processed="660"];

sum3 [avg_exec_time_ns="778.138µs" ];
sum3 -> groupby4 [processed="660"];

groupby4 [avg_exec_time_ns="424ns" ];
groupby4 -> top5 [processed="660"];

top5 [avg_exec_time_ns="546.435999ms" ];
top5 -> alert6 [processed="659"];

alert6 [alerts_triggered="0" avg_exec_time_ns="35.414µs" crits_triggered="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" ];
}

I have two questions:

  1. Taking into account that the field “hit” is always > 0, why is there not any alert triggered?

  2. According to the documentation:

Only tags that are dimensions in the grouping will be preserved; all other tags are dropped.

If I do a groupBy(), I will obtain the pairs of “hus” and “hpr” with more “hits”?

Thanks and best regards,
Marcos.

#4

Can you add a |log() node just before the alert node and share the logs, so we can inspect what it happening?

#5

@nathaniel this is the error:

[TOP12_timeout:alert10] 2017/04/26 07:44:26 E! error evaluating expression for level CRITICAL: no field or tag exists for hits

#6

Ah yes I should have spotted that earlier. The |top node renames the field to top you can do this to preserve the name hits

    |top(1, 'hits')
       .as('hits')
#7

Hi @nathaniel,

After rename ‘top’ as ‘hits’, alerts are triggered, but now the problem is that the fact of do a “groupBy()” causes that all the tags are dropped and I’m not able to know the 10 pairs of “hus” and “hpr” with more “hits”.

This is my message code:

|alert()
        .slack()
        .channel('#ntf-tgx')
        .iconEmoji(':scream:')
        .username('Kapacitor Alert - TIMEOUT')
        .message('TOP12 {{ .Level }} High timeout rate for {{index .Tags "hus"}} - {{index .Tags "hpr"}}.')
        .crit(lambda: "hits" >= 1)
        .noRecoveries()
        .stateChangesOnly()

But I receive the below alert (without neither his nor hpr):

Thanks!,
Marcos.