Stats(time) vs Unit(time) in Deadman alert

In the Deadman documentation this example deadman is given:

data
    |deadman(100.0, 10s)

It’s said that this is equivalent to this:

data
    |stats(10s)
        .align()
    |derivative('emitted')
        .unit(10s)
        .nonNegative()
    |alert()
        .id('node \'stream0\' in task \'{{ .TaskName }}\'')
        .message('{{ .ID }} is {{ if eq .Level "OK" }}alive{{ else }}dead{{ end }}: {{ index .Fields "emitted" | printf "%0.3f" }} points/10s.')
        .crit(lambda: "emitted" <= 100.0)

My application is related to measurements that should be recorded every 90 minutes. If several of these measurements are missing in a row, there is a problem and we need to take action.

However, we want to avoid getting alerts if we miss a single measurement, because that is within the tolerance of the system.

So, my question is related to the period of the alert, and the sample rate. Does this sample occur at fixed intervals?

The amount of time we decided would be appropriate to wait before alerting was 5 hours from the last recorded data point. But, in the case that we get our last data point right after the beginning of a fixed 5 hour interval, if I understand correctly then we would need to wait effectively two intervals before we get our alert. This additional 5 hour wait would be a problem for us.

Ideally what I would like to do is have the deadman take it’s derivative check over the previous 5 hours, but to do this check more often.

One option I have been trying to implement this is to use different times for the stats( 1m) and derivative - unit( 5h )

However, in my preliminary testing, this does not seem to work. I’m curious if anyone has experience resolving this sort of issue.

Did you find any answer for this?

It’s also bugging me.

Hello @greenenvy
Are you willing to use Flux tasks instead? You could do this processing and check work more frequently and write it to a bucket and then query that bucket for the actual alert.

1 Like

Thank you for your suggestion. I didn’t really explore the Flux tasks as i should, as i understood that for stream processing, Kapacitor is still needed, so i kept doing the deadman alerting on there. As from the question of the author, i did not really find a solution for more precise checking on Kapacitor, and when you mentioned the Flux tasks, i read your blogpost and was able to do it exactly as i wanted. Thank you very much.

1 Like

@greenenvy I’m so happy you were successful. Thanks for letting me know!