Tl;dr When data from batch comes in as 0 (no data), the tick script stops emitting.
I have this tick script that I want to run every 15 min, that should alert me if my app crashes > 5 times. Problem is, if there are 0 crash events, then the script stops running.
Tick script:
var data = batch
|query('SELECT sum("value") AS "sum_value" FROM "primary"."autogen"."restarts" WHERE time > now() - 1h AND "pool"=\'nodeweb\' ORDER BY time DESC LIMIT 1')
// Query 1 hour of restart data from now
.period(1h)
// run this query every 15min (every doesn't work?!?)
// .every(15m)
// run this cron every 15 min
.cron('*/15 * * * *')
// when no data is coming in fill in zero (so the warning levels can reset)
.fill(0)
// grouping by time will separate data into buckets. Order by time desc (see query) will ensure that newest data is what is alerted on.
// LIMIT 1 ensures we have 1 bucket to alert against.
.groupBy('pool', time(1h))
// debug logging
|log()
.prefix('status-0')
.level('ERROR')
var trigger = data
|alert()
// critical warning if sum_value goes too high
.crit(lambda: "sum_value" > crit)
// reset to lower level
.critReset(lambda: "sum_value" < 10)
// warn if restarts are more than 5 per hour
.warn(lambda: "sum_value" > 5)
// fire on state changes only ('normal' to 'critical' etc), but still fire every 30min
.stateChangesOnly(30m)
.message(message)
.id(idVar)
.idTag(idTag)
.messageField(messageField)
.durationField(durationField)
// log alert to local log file...
.log('/tmp/alerts.log')
.slack()
.channel('#engineers')
The logs that proves it stops when sum_value=0
ts=2018-01-20T02:45:00.005Z lvl=error msg="begin batch" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=pnodeweb tag_pool=nodeweb time=2018-01-20T02:00:00Z
ts=2018-01-20T02:45:00.005Z lvl=error msg="batch point" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=nodeweb field_sum_value=2 time=2018-01-20T02:00:00Z
ts=2018-01-20T02:45:00.005Z lvl=error msg="end batch" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=pnodeweb time=2018-01-20T02:00:00Z
ts=2018-01-20T03:00:00.019Z lvl=error msg="begin batch" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=nodeweb time=2018-01-20T02:00:00Z
ts=2018-01-20T03:00:00.020Z lvl=error msg="batch point" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=nodeweb field_sum_value=2 time=2018-01-20T02:00:00Z
ts=2018-01-20T03:00:00.020Z lvl=error msg="end batch" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=nodeweb time=2018-01-20T02:00:00Z
ts=2018-01-20T03:15:00.008Z lvl=error msg="begin batch" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=nodeweb time=2018-01-20T03:00:00Z
ts=2018-01-20T03:15:00.009Z lvl=error msg="batch point" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=nodeweb field_sum_value=0 time=2018-01-20T03:00:00Z
ts=2018-01-20T03:15:00.009Z lvl=error msg="end batch" service=kapacitor task_master=main task=prod_restart_alert node=log2 prefix=status-0 name=restarts group=pool=nodeweb tag_pool=nodeweb time=2018-01-20T03:00:00Z
After this the task stops. No more task logs.
Any thoughts?