Hi, so I am currently setting up the whole TICK stack at our company and so far I’m loving it.
One problem I am facing right now is creating Kapacitor rules through Chronograf which span over multiple hosts and report the host’s name in the alert message.
For example, this is a query which I built using Chronograf:
SELECT “usage_idle” FROM “telegraf”.“autogen”.“cpu” WHERE time > now() - 15m AND (“host”=‘host1’ OR “host”=‘host2’ …’)
This is the templated message:
{{.Time}} cpu usage is >80% over the last 1 minute on {{ index .Tags “host” }}
On the actual message, the “{{ index .Tags “host” }}” is just blank. Would a query like that even report an alert for every occurence per host or do I need to create a specific alert rule per host? And why is the host tag in the message empty?
Edit 1:
Ok, I fixed it. I had to add the “GROUP BY” clause. The query now is:
SELECT “usage_idle” FROM “telegraf”.“autogen”.“cpu” WHERE time > now() - 15m AND “cpu”=‘cpu-total’ AND (“host”=‘host1’ OR “host”=‘host2’ …) GROUP BY “host”
Edit 2:
As it looks now however, I am unable say “when the usage is >80% over a minute, make an alert”. Has anybody an idea on how to do that via Chronograf?
My current query: SELECT “usage_idle” FROM “telegraf”.“autogen”.“cpu” WHERE time > now() - 15m AND “cpu”=‘cpu-total’ AND (“host”=‘host1’ OR “host”=‘host2’ …) GROUP BY “host”
And this is the rule but all I can specify is the change relative to the current state, instead of a threshold over a period of time: