Linking custom Kapacitor tasks to Chronograf

The current TICK stack requires you to “template” your Kapacitor tasks through Chronograf if you want the alerts to show up in the Chronograf status page (or history). However, Kapacitor tasks generated by Chronograf have messy long names and do therefore not describe what they are actually monitoring (i.e. chronograf-v1-35931a9b-b5de-4430-b18c-1ae17528a049).

My current set of custom tasks have names like: cpu_usage, ram_usage and so on (and for readability sake they are actually formatted in a good way).

What I want to be able to do, is linking custom written Kapacitor tasks to Chronograf. I first tried mimicking the pipeline of a Chronograf generated Kapacitor task with one of my own custom tasks. Then I found out, that there must be some special type of relation between Chronograf’s generated tasks and Kapacitor. I guess Chronograf generated tasks are also stored somehwere inside Chronograf’s BoltDB.

Is there a way to link custom written Kapacitor tasks to Chronograf’s status page and history log?

Btw: If you template the task through Chronograf first and then overwrite the task with a custom TICK script kapacitor define chronograf-v1-e52857d9-908d-44bf-8c3b-ba39374928d7 cpu_usage.tick, the alert rule in Chronograf gets renamed to the id and becomes unusable through the web interface (expected, but not the name change).

@luca-moser Can you share the snippets of the TICKscripts you tried to use to link them with Chronograf?

There is nothing special about the Kapacitor+Chronograf link, if you have your alerts write their state into the same InfluxDB database that chronograf uses then it will just work™.

Hi @nathaniel , thanks for your help.

How does Chronograf know the name of the alert rule/task if the task’s name in Kapacitor is just the above mentioned ID? Chronograf must hold a reference to the actual name somewhere, which is not inside the task stored in Kapacitor? (maybe I’m missing something)

Here is the script which tries to write the alerts to the Chronograf series.

// database values
var db = 'telegraf'
var rp = 'autogen'
var groupBy = ['host']

// what to measure
var measurement = 'cpu'

// only monitor for those hosts
var hostsFilter = 'ch-cockpit-db,ch-cockpit-mail,ch-cockpit-rm,ch-cockpit-web,srzh06,srzh08,srzh28'

// message to send
var message = '{{ .Level }} - {{ index .Tags "host" }} - {{ .Time }}:
cpu usage over the last 30 seconds was {{ index .Fields "floored_stat" }}%.'

// alert output database (chronograf)
var taskName = 'cpu_usage'
var outDB = 'chronograf'
var outRP = 'autogen'
var outMeasurement = 'alerts'
var triggerType = 'threshold'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'

// thresholds
var infoLvl = 75
var warnLvl = 85
var critLvl = 95

// dataframe
var data = stream
    |from()
        .database(db)
        .retentionPolicy(rp)
        .measurement(measurement)
        .where(lambda: strContains(hostsFilter, "host") AND "cpu" == 'cpu-total')
        .groupBy(groupBy)
    |eval(lambda: 100.0 - "usage_idle")
        .as('cpu_used')
    |window()
        .period(30s)
        .every(10s)
    |mean('cpu_used')
        .as('stat')
    |eval(lambda: floor("stat"))
        .as('floored_stat')
        .keep()

// threshold
var trigger = data
    |alert()
        .info(lambda: "stat" > infoLvl)
        .warn(lambda: "stat" > warnLvl)
        .crit(lambda: "stat" > critLvl)
        .stateChangesOnly()
        .message(message)
        .id('{{ index .Tags "host" }}/host/cpu_used')
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
        .telegram()

// write alert into chronograf database
trigger
    |influxDBOut()
        .create()
        .database(outDB)
        .retentionPolicy(outRP)
        .measurement(outMeasurement)
        .tag('alertName', taskName)
        .tag('triggerType', triggerType)

// make results available through http
trigger
        |httpOut('output')

@nathaniel
Ok, I fixed it. According to https://github.com/influxdata/chronograf/blob/master/ui/src/alerts/apis/index.js the value of the measurement must be present in the alert stored inside Chronograf’s InfluxDB database. I’ve renamed the “stat” value to “value” and now it works :slight_smile:

2 Likes

@nathaniel

Do you have an idea how I could set the tag alertName to something built from container_down - "container_name" where "container_name" is a tag from the measurement?

Currently I’m getting following error:

invalid TICKscript: Failed to handle 1 argument: name "container_name" is undefined.

I don’t completely understand why the tag "container_name" falls out of scope inside the third node or how I can keep it from the first FromNode.

This is my current script:

// Luca Moser: 22.06.2017
// this script monitors whether the given containers are down.

// source database
var db = 'telegraf'
var rp = 'autogen'
var groupBy = ['container_name']

// what to measure
var measurement = 'docker_container_cpu'

// containers
var containers = 'dmsbeta_server_1,dmsbeta_sessions_1,dmsbeta_storage_1'

// telegram message
var message = '{{ .Level }} - {{ index .Tags "container_name" }} - {{ .Time }}:
couldnt measure any data for 1 minute, is the container down?'

// alert output database (chronograf)
var taskName = 'container_down - '
var idVar = taskName + ':{{ index .Tags "container_name" }}'
var outDB = 'chronograf'
var outRP = 'autogen'
var outMeasurement = 'alerts'
var triggerType = 'deadman'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'

// period
var period = 20s

var data = stream
    |from()
	.database(db)
	.retentionPolicy(rp)
        .measurement(measurement)
	.where(lambda: strContains(containers, "container_name"))
	.groupBy(groupBy)
	
var trigger = data
    |deadman(0.0, period)
	.stateChangesOnly()
	.id(idVar)
        .idTag(idTag)
        .levelTag(levelTag)
        .messageField(messageField)
        .durationField(durationField)
	.message(message)
	.telegram()

// write alert into chronograf database
trigger
    |eval(lambda: "emitted")
	    .as('value')
	    .keep('value', messageField, durationField)
    |influxDBOut()
        .create()
        .database(outDB)
        .retentionPolicy(outRP)
        .measurement(outMeasurement)
        .tag('alertName', 'container_down - ' + string("container_name"))
        .tag('triggerType', triggerType)

// make results available through http
trigger
        |httpOut('output')

The .tag property of the InfluxDBOut node is only used for adding static tags to the data.

To add tags that are a function of the data use the eval node with the .tags property, see https://docs.influxdata.com/kapacitor/v1.3/nodes/eval_node/#tags.

1 Like

Thanks, that worked wonderfully! :slight_smile: