Templating a stateDuration alert task

I wrote this up as a bug report (Issue 1757) the other day when I noticed it. I’m not sure that it will be considered a bug, but I feel that it should be considered a bug or a severe limitation on the stateDuration node.

This issue is that the stateDuration only emits data when it receives data, even when it should be incrementing the duration field that it adds to the metric it emits. I understand from reading some of the implementation that all the stateDuration node is doing is saving the start time when the state is true. In the current design there doesn’t really appear to be an obvious mechanism to emit additional metrics without incoming data. Ideally, as I describe in my bug report, when the state is true, the stateDuration node should re-emit whatever the latest metric was anytime the added ‘duration’ field should be incremented according to the start time and the unit() attribute setting.

This is necessary for expected alerting behavior when the incoming data rate is much lower than the resolution of the alert. For example if I only get data at a minimum rate of once a minute, but I want to alert when the state has occurred for 15 seconds, I can’t do that. I would have to increase the data rate to something much higher than the resolution of my stateDuration alert, which is redundant and unnecessary for points like booleans.

This has been mentioned in at least two other places on the forum that I have found:

  • How to use stateDuration?
  • Calculating event duration from boolean values? https:// community . influxdata . com/t/calculating-event-duration-from-boolean-values/2440

I’m interested in hearing reasons why this should or shouldn’t be implemented into kapacitor’s design over on the bug report, but I’m posting here to find help in finding a work around for my alerting template.

Is the best way to simply write a batch task that runs at a much higher frequency than the resolution that we are detecting with the alert? Is there a better way so I can get the best of both worlds: the efficiency of a stream, with the behavior of the batch task?

The link that I couldn’t include: