stateDuration - Set duration before returning to "OK" state?

Hello,

I am using the stateDuration node to alert us when the CPU usage on a server has been above a certain level for a certain duration. For example, “WARNING” when greater than 60%, for 5 minutes. This is working fine. However, as soon as the CPU usage drops below 60% again, the alert immediately returns to the “OK” state.

I can see this is the expected behaviour as per the docs - “When a point evaluates as false, the state duration is reset.”

Is it possible to set a duration required before the alert returns to the “OK” state? i.e “WARNING when CPU usage is greater than 60% for 5 minutes. Return to OK when CPU usage is below 60% for 5 minutes”?

Here is an example of the current TICK scripts we’re using (slightly slimmed down):

stream
|from()
.measurement(‘cpu’)
|where(lambda: (“cpu” == ‘cpu-total’) AND (“host” == ‘ubuntu-xenial’))
|groupBy(‘host’)
|stateDuration(lambda: “usage_idle” <= 40)
.unit(1m)
.as(‘warn_duration’)
|stateDuration(lambda: “usage_idle” <= 20)
.unit(1m)
.as(‘crit_duration’)
|alert()
// Warn after 2 minutes
.warn(lambda: “warn_duration” >= 2)
// Crit after 5 minutes
.crit(lambda: “crit_duration” >= 5)
// Only alert when an alert is triggered or returns to normal
.stateChangesOnly()

Thanks in advance for any help. Please let me know if you need any further information :slight_smile:

I have now found the solution to this. It can be accomplished with the .warnReset() and .critReset() nodes

Example TICKscritp attached showing how this can be used.example-crit-reset-tickscript.txt (1.5 KB)

1 Like