We are using telegraf to monitor our servers and network devices in which telegraf sends data every 5 min, now we want to get a alert when any of our server did not send data for 15 min. below is the script i am using to achieve that.
var data = batch
|query(''' Select last(usage_user) from prod_hosts.autogen.cpu where cpu = 'cpu-total' ''')
.period(15m)
.every(5m)
.groupBy(*)
data
|deadman(0.0, 15m)
.stateChangesOnly()
.id('{{ index .Tags "node" }}')
.message('Server {{ .ID }} is OFFLINE')
.messageField('message')
.email()
.to('XXXXXX')
.log('/tmp/chronograf/deadman.log')
now when i use Kapacitor watch to check this task it is running fine and if i run the influxql query in db i get the result without the server for which i ave stopped telegraf service.
But It is not triggering any alerts, my initial though is it have something to do with time in deadman and period.