Kapacitor - if value exceeds threshold for specified duration

absolutejam · October 10, 2017, 1:37am

Hi guys,

As I’m diving into Kapacitor more and more, I’m trying to refine my checks and alerts. One thing I’d like to check if you if a value meets a criteria (e.g. Exceeds a value, within s range) for a duration. This will help me exclude flapping and better get an idea if a metric has spiked for an extended period.

Currently im doing this by using timeShift a few times and comparing, but it’s messy and I’m sure there must be a better way. Is there a nice easy way to do this?

One example would be cpu usage > 90% for 1 minute, or even better, using one of the queries chronograf generated for me (️) cpu usage is 50% higher than the last 10 minute period, for at least 5 minutes.

Cheers!

thibodux · October 10, 2017, 11:06am

To get to an exact answer, you need to share a bit more information, but the basics probably involve using a batch job like this example QueryNode | InfluxData Documentation Archive.

You would set the period to 10m and every to the frequency you want the job the run.

If you are sending the cpu_usage stats every minute, then you could simple count the number of instances over 50% using the WHERE clause. If you are sending the stats more or less frequently, then you would do the counting of values in a later function. Afterward, I would assume you would use AlertNode to see if the counted value exceeds the threshold and then outputs some message.

Please share more details about your data and its frequency if you want some specific query help.

absolutejam · October 10, 2017, 2:07pm

Thanks for your reply.

I’ve spent a bit of time on this today, trying to figure out the best way to process this. In the end, I settled for a ‘critical rate’ per time window (Eg. alert if 30% of points for CPU usage are above 90% utilisation). Not sure how well this will work out, but I’ll give it a go.

So far I have Hastebin: Send and Save Text or Code Snippets for Free | Toptal® which pretty much does the job, the only issue I’m having is that the final measurement does not retain any of the tags (So, when I alert to Slack, I can’t pass the hosts’s name), Any suggestions how I can streamline this a bit?

EDIT: As per replay-live lacks tags data · Issue #1078 · influxdata/kapacitor · GitHub, it looks like the initial recording/query must contain a GROUP BY "tag" statement to retain tags.

Regards,
James.

absolutejam · October 11, 2017, 10:17am

As an aside, is it possible to utilise the following from an AlertNode?

Available Statistics:
alerts_triggered – Total number of alerts triggered
oks_triggered – Number of OK alerts triggered
infos_triggered – Number of Info alerts triggered
warns_triggered – Number of Warn alerts triggered
crits_triggered – Number of Crit alerts triggered

Because I could use that node to trigger my warns and alerts and use the value later, if I can get the values out somehow.

absolutejam · October 13, 2017, 7:25pm

I’ve seen there’s a StatsNode which I might play with later.

So far my 'x % of ‘critical’ alerts in a time window works out nicely to see if a host is persistently providing critical metrics

alexphillips · May 21, 2018, 6:49pm

@absolutejam Hey, I’m trying to do something very similar to what you are, but the hastebin link no longer works. Can you re-post your script? Or did you find a better way to do this?

absolutejam · August 26, 2018, 8:34pm

Hey @alexphillips, not sure if it’s the right one, but I have https://gitlab.com/absolutejam/tickscripts/blob/master/cpu_crit_rate_15m.tick saved which I think rings a bell.

philb · August 30, 2018, 4:06pm

Would the stateDuration node do what you want?

|stateDuration(lambda: “thing you want to count” > crit)
.unit(1m)
.as(‘critDuration’)

Then in your alert node

.crit(lambda: "critDuration > 5)

There is alittle bit more to it though stateDuration

Thats how i count it anyway

hope that helps

Topic		Replies	Views
Kapacitor Alert only trigger by x amount of duration Kapacitor	2	922	December 2, 2020
Alerting if mean of a value is above a threshold value for a certain period of time kapacitor	4	3149	March 15, 2019
Kapacitor - Alert only after X secs above threshold kapacitor	1	705	December 2, 2020
Kapacitor - constant alerting every 20 seconds kapacitor	5	1125	October 26, 2018
Is there a way to custom the threshold of a Kapacitor's alert? Kapacitor	4	665	September 21, 2021

Kapacitor - if value exceeds threshold for specified duration

Related topics