Stats(time) vs Unit(time) in Deadman alert

nathanE · August 16, 2019, 4:31pm

In the Deadman documentation this example deadman is given:

data
    |deadman(100.0, 10s)

It’s said that this is equivalent to this:

data
    |stats(10s)
        .align()
    |derivative('emitted')
        .unit(10s)
        .nonNegative()
    |alert()
        .id('node \'stream0\' in task \'{{ .TaskName }}\'')
        .message('{{ .ID }} is {{ if eq .Level "OK" }}alive{{ else }}dead{{ end }}: {{ index .Fields "emitted" | printf "%0.3f" }} points/10s.')
        .crit(lambda: "emitted" <= 100.0)

My application is related to measurements that should be recorded every 90 minutes. If several of these measurements are missing in a row, there is a problem and we need to take action.

However, we want to avoid getting alerts if we miss a single measurement, because that is within the tolerance of the system.

So, my question is related to the period of the alert, and the sample rate. Does this sample occur at fixed intervals?

The amount of time we decided would be appropriate to wait before alerting was 5 hours from the last recorded data point. But, in the case that we get our last data point right after the beginning of a fixed 5 hour interval, if I understand correctly then we would need to wait effectively two intervals before we get our alert. This additional 5 hour wait would be a problem for us.

Ideally what I would like to do is have the deadman take it’s derivative check over the previous 5 hours, but to do this check more often.

One option I have been trying to implement this is to use different times for the stats( 1m) and derivative - unit( 5h )

However, in my preliminary testing, this does not seem to work. I’m curious if anyone has experience resolving this sort of issue.

greenenvy · April 21, 2022, 4:16pm

Did you find any answer for this?

It’s also bugging me.

Anaisdg · April 22, 2022, 7:30pm

Hello @greenenvy
Are you willing to use Flux tasks instead? You could do this processing and check work more frequently and write it to a bucket and then query that bucket for the actual alert.

greenenvy · April 25, 2022, 6:52am

Thank you for your suggestion. I didn’t really explore the Flux tasks as i should, as i understood that for stream processing, Kapacitor is still needed, so i kept doing the deadman alerting on there. As from the question of the author, i did not really find a solution for more precise checking on Kapacitor, and when you mentioned the Flux tasks, i read your blogpost and was able to do it exactly as i wanted. Thank you very much.

Anaisdg · April 25, 2022, 9:41pm

@greenenvy I’m so happy you were successful. Thanks for letting me know!

Topic		Replies	Views
Help with Deadman Alert Timing Kapacitor influxdb , kapacitor	1	749	January 18, 2019
Period and Every in DeadMan Alert Kapacitor	1	553	May 30, 2019
Most recent point alert Kapacitor kapacitor	3	585	March 30, 2020
Question about Kapacitor Deadman switch with Batch data telegraf , influxql	3	1563	February 28, 2019
Help with Deadman Kapacitor influxdb , kapacitor	4	1016	April 5, 2017

Stats(time) vs Unit(time) in Deadman alert

Related topics