Kapacitor and Processing Alerting on Sparse Data

mselby · December 18, 2018, 11:13pm

As always all and any help is greatly appreciated.

I have somewhat strange situation where I need to do some alerting on
measurements that send data to Influx on a non regular basis. I am
ending up with a situation where the absence of data points is something
that should be considered OK. I an having trouble figuring out how to
create the correct TICKscript to do what I need.

I know that sparse data is not what Kapacitor is best at but am still
trying to use it to get this job done.

There was a Kapacitor Issue raised
([feature request] Improve handling of data gaps · Issue #1039 · influxdata/kapacitor · GitHub) similar to this
and a few pseudo solutions offered and I am trying to implement one of
those solutions.

We have a set of jobs that run on a periodic basis and take a variable
amount of time to run when they do. When the jobs hit a failure
condition they emit a metric to Influx. We end up having a time series
like this.

app.foobarbaz.failures,name=app1 value=1 01-01-2018@01:30
app.foobarbaz.failures,name=app1 value=3 01-01-2018@02:00
app.foobarbaz.failures,name=app1 value=1 01-01-2018@02:30
app.foobarbaz.failures,name=app1 value=2 01-01-2018@03:30
app.foobarbaz.failures,name=app1 value=4 01-01-2018@05:30

app.foobarbaz.failures,name=app1 value=4 01-01-2018@10:30
app.foobarbaz.failures,name=app1 value=4 01-01-2018@11:30
app.foobarbaz.failures,name=app1 value=4 01-01-2018@12:00
app.foobarbaz.failures,name=app1 value=4 01-01-2018@12:30
app.foobarbaz.failures,name=app1 value=4 01-01-2018@13:00

app.foobarbaz.failures,name=app1 value=1 01-02-2018@01:30
app.foobarbaz.failures,name=app1 value=3 01-02-2018@02:00
app.foobarbaz.failures,name=app1 value=1 01-02-2018@02:30
app.foobarbaz.failures,name=app1 value=2 01-02-2018@03:30
app.foobarbaz.failures,name=app1 value=4 01-02-2018@05:30

Every hour we look back at the last 6 hours of “value” for this
measurment and if the sum over that time is > x we alert.

The problem s that we can go more than 6 hours without any points being
sent into the system. So if we end up in a failure state and no failures
happen for more than 6 hours we stay in a failure state becacuse
Kapacitor has no points to process. Default and Fill nodes are not
useful here because they only work when there are data points to deal
with.

In our case it would also be OK to insert a metric with value=0 to
catalyze the system. I could use secondary TICK script and setup a
deadman node to an InfluxDBOut node but I really really do not want to
have deal with 2 tasks for every job.

Anyone have suggestions on how I might handle this?

This is the TICK that we have that works fine as long as points keep coming into the system.

var db          = 'mydb'
var rp          = 'autogen'
var measurement = 'app.foobarbaz.failues'
var groupBy     = []
var whereFilter = lambda: ("name" == 'app1')
var period      = 6h
var every       = 1h
var crit        = 5

var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .groupBy(groupBy)
        .where(whereFilter)
    |window()
        .period(period)
        .every(every)
        .align()
    |sum('value')
        .as('value')

var trigger = data
    |alert()
        .crit(lambda: "value" > crit)
        .stateChangesOnly()
        .log('/tmp/kapacitor/fbb.log')

Topic		Replies	Views
Generic Deadman Alerts On Sparse Events Kapacitor	1	809	November 15, 2018
Kapacitor to alert when data stops for more than 36h and create a dashboard Kapacitor kapacitor	1	573	November 14, 2018
Is there a way to custom the threshold of a Kapacitor's alert? Kapacitor	4	576	September 21, 2021
Kapacitor batch task using Flux Kapacitor	5	418	July 11, 2022
Kapacitor- Handle Alert criteria based on global state and recurring window with a timer firing every 5 secs with emission of data Welcome & Getting Started	0	357	August 13, 2019

Kapacitor and Processing Alerting on Sparse Data

Related Topics