Kapacitor to alert when data stops for more than 36h and create a dashboard




I have been working with Kapacitor to to alert when we stop receiving data, I tried using stream but had memory and stability issues I think because the period was so long. I changed to using batch using the tick script below.

The alerting is working but when the data is written back to influxdb for the dashboard the time is the time of the last measurement which doesn’t change when the alert changes from warn to crit so the dashboard is unable to show the correct status.

Does anyone have any idea if its possible to do what want?

// A nanoseconds
var nano_second = 1000000000

var nano_hour = 60 * 60 * nano_second

// 36 hours
var warn = 36 * nano_hour

// 48 hours
var crit = 48 * nano_hour

var period = 72h

var every = 1m

// Dataframe
var data = batch
    |query('''SELECT last("count") AS "value" FROM "trp"."autogen"."integration" 
              WHERE ("type" = 'member' OR "type" = 'subscription') ''')
        .groupBy('host', 'rem_chain_id', 'type')

// Thresholds
var alert = data
    |eval(lambda: unixNano(now()) - unixNano("time"))
        .id('{{ index .Tags "host"}}-integration-{{ index .Tags "type"}}-{{ index .Tags "rem_chain_id"}}')
        .message('{{ index .Tags "host"}} - INTEGRATION {{ .Level }} - {{ index .Tags "rem_chain_id"}} - {{ index .Tags "type" }}')
        .warn(lambda: "diff_value" > warn)
        .crit(lambda: "diff_value" > crit)


Hi ,
to run a query every minute over a period of 72 hrs seems indeed resource consuming ,
the period should be much lower if you want to alert when there is no data received anymore ,
Maybe you can use stream again but with a period of seconds or minutes ?

Please also check out deadman’s switch for alerting on low throughput …