Task check - one message with multiple hosts in alert

I am still quite new to influx so I am still trying to get my head around the concepts and think the right way about writing scripts. I want to write a deadman check for my endpoints (servers). Looking through the forums and tutorials, I came up with this script which catches 1 or n servers that have not responded in the past hour window

import "influxdata/influxdb/monitor"
import "experimental"
import "slack"
option v = {bucket: "telegraf"}

option task = {name: "Manual Deadman Check", every: 2m}

mydata =
    from(bucket: "telegraf")
        |> range(start: -60m)
        |> filter(fn: (r) => r._measurement == "system" and r._field == "uptime")
        // ignore hosts that come and go
        |> filter(fn: (r) => r["host"] !~ /^ps.+/)
        |> monitor.deadman(t: experimental.subDuration(from: now(), d: 2m))
        |> filter(fn: (r) => r["dead"] == true)
        |> yield(name: "myvar")
        |> findRecord(fn: (key) => true, idx: 0)

if exists mydata.host then
    slack.message(

        url: "https://mattermost..blah.blah.blah",
        token: "",
        channel: "",
        text: "Danger: Deadman Check last checkin from ${mydata.host} was ${mydata._time}",
        color: "danger",
    )
else
    0

small detail, is there a better format to ignore an else condition than just a zero ?

my main problem I cant quite figure out is how to list ALL the hosts that are marked dead in the webhook call. Right now, this script will just list the last one in the resultant table.
I know there are no loops per se, I am thinking I need a map statement perhaps and then accumulate into a single string maybe ? But I dont see anyway to just append to a string the various hosts names ? Or am I thinking about this the wrong way ?

@mdtancsa The following uses reduce() to concatenate the dead hosts into a single string and send the string as part of the message to Slack:

import "influxdata/influxdb/monitor"
import "experimental"
import "slack"

option task = {name: "Manual Deadman Check", every: 2m}

deadHosts =
    from(bucket: "telegraf")
        |> range(start: -60m)
        |> filter(fn: (r) => r._measurement == "system" and r._field == "uptime")
        // ignore hosts that come and go
        |> filter(fn: (r) => r["host"] !~ /^ps.+/)
        |> monitor.deadman(t: experimental.subDuration(from: now(), d: 2m))
        |> filter(fn: (r) => r["dead"] == true)
        |> last()
        |> group()
        |> unique(column: "host")

sendAlerts = (tables=<-) => {
    _deadHostCount = (tables |> count() |> findRecord(fn: (key) => true, idx: 0))._value
    _deadHostList =
        tables
            |> reduce(
                identity: {hosts: ""},
                fn: (r, accumulator) =>
                    ({
                        hosts:
                            accumulator.hosts + "*Host*: ${r.host}, *Last Reported*: ${r._time}\n",
                    }),
            )

    _output =
        if _deadHostCount > 0 then
            _deadHostList
                |> map(
                    fn: (r) =>
                        ({r with response:
                                slack.message(
                                    url: "https://mattermost..blah.blah.blah",
                                    token: "",
                                    channel: "",
                                    text: "*Danger:** Deadman Check - the following host(s) not reporting:\n\n${r.hosts}",
                                    color: "danger",
                                ),
                        }),
                )
        else
            _deadHostList

    return _output
}

deadHosts |> sendAlerts()
1 Like

Thank you so much @scott ! I will review and experiment with it. Like I said, I am still trying to get my head around the “right” way to process my data vs thinking in SQL terms.
One quick question, why the underscore in variable names. I take it, it is purely convention as being just local in scope ?

Yeah, it’s just a convention. It doesn’t change anything functionally. Just a convention I like to use for scoped variables.

1 Like

@scott one problem I ran into with the above script is if there are no dead hosts, I get an error

 error calling function "sendAlerts" @54:14-54:26: unsupported binary expression invalid > int

But as long as there is at least one, it works!

Try adding onEmtpy: "keep" to each of the filter calls. count() should return 0 on empty tables:

import "influxdata/influxdb/monitor"
import "experimental"
import "slack"

option task = {name: "Manual Deadman Check", every: 2m}

deadHosts =
    from(bucket: "telegraf")
        |> range(start: -60m)
        |> filter(fn: (r) => r._measurement == "system" and r._field == "uptime", onEmpty: "keep")
        // ignore hosts that come and go
        |> filter(fn: (r) => r["host"] !~ /^ps.+/, onEmpty: "keep")
        |> monitor.deadman(t: experimental.subDuration(from: now(), d: 2m))
        |> filter(fn: (r) => r["dead"] == true, onEmpty: "keep")
        |> group()
        |> unique(column: "host")

sendAlerts = (tables=<-) => {
    _deadHostCount = (tables |> count() |> findRecord(fn: (key) => true, idx: 0))._value
    _deadHostList =
        tables
            |> reduce(
                identity: {hosts: ""},
                fn: (r, accumulator) =>
                    ({
                        hosts:
                            accumulator.hosts + "*Host*: ${r.host}, *Last Reported*: ${r._time}\n",
                    }),
            )

    _output =
        if _deadHostCount > 0 then
            _deadHostList
                |> map(
                    fn: (r) =>
                        ({r with response:
                                slack.message(
                                    url: "https://mattermost..blah.blah.blah",
                                    token: "",
                                    channel: "",
                                    text: "*Danger:** Deadman Check - the following host(s) not reporting:\n\n${r.hosts}",
                                    color: "danger",
                                ),
                        }),
                )
        else
            _deadHostList

    return _output
}

deadHosts |> sendAlerts()

Thanks very much @scott that indeed works! I really appreciate the code. I am at that stage where I am still struggling to “think the influx way”, but its starting to get just a little be more sensical to me.