I’m attempting to create a Kapacitor alarm based on a lack of successful queries from a script I’m running that gathers data from an external source. I’m new to TICKscript and believe I must be doing something wrong.
Here’s my script:
batch
|query('''
SELECT non_negative_derivative(mean("SuccessfulQueryCount"), 10m) as "SuccessRate"
FROM "telegraf"."autogen"."data-importer"
WHERE time > now() - 2h
GROUP BY time(10m)
''')
.period(2h)
.every(15m)
.groupBy(time(10m), 'env')
|mean('SuccessRate')
|alert()
.crit(lambda: "SuccessRate" <= 10)
.critReset(lambda: "SuccessRate" > 50)
.stateChangesOnly(10m)
.log('/tmp/test')
I’ve recorded 148m of data which begins with many successful queries and then ends with 2 hours of unsuccessful queries (I did this by changing the hosts file to make the queries fail). I replayed this to my task and don’t get the expected alerts. In fact, I get 0’s across the board within the DOT:
DOT:
digraph test {
graph [throughput="0.00 batches/s"];
query1 [avg_exec_time_ns="0s" batches_queried="0" errors="0" points_queried="0" working_cardinality="0" ];
query1 -> alert2 [processed="0"];
alert2 [alerts_triggered="0" avg_exec_time_ns="0s" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="0" ];
}
At this point I’m trying to understand if the problem is with replaying the recording (my test process), or with the script itself. I have a series of questions:
-
In the query I find using WHERE and GROUP BY to be redundant with using .period and .groupBy. Should I not use these in the SELECT query and only using the properties?
-
Am I chaining the ‘mean’ to the ‘query’ and ‘alert’ properly? I’m attempting to take the average of the resulting 12 values (120 minute query / 10 minute groups).
-
Is there some method for testing TICKscript? Specifically, execution of the DAG nodes? I haven’t found anything when scouring the docs, but this is a pain to troubleshoot!
If you see any other issue with my query or my methodology please let me know.