Kapacitor atypical join (or some other "way") - network io alerts

kapacitor

#1

I am trying to create a kapacitor job that will alert when the total bytes sent + recv on any network interface that we are tracking with the telegraf system/net plugin reaches a certain percent of the overall bandwidth of an interface.

The telegraf system/net input gives us the data rate but can not tell us the speed of the interface which we need to calculate the percent of the current usage

We have an influxdb measurement in the telegraf database that contains the max link speed of each interface. This table contains a single point for each interface that we overwrite with a duplicate timestamp. we update a value for each point that marks the last update for internal freshness housekeeping. These values almost never change and we felt this was the most efficient way to track this in influx.

The interface table looks like this

> select * from "net_int_speed" limit 7
name: net_int_speed
time       dc  env  host        int  speed updated
----       --  ---  ----        ---  ----- -------
1000000000 ch1 prod hddn001.ch1 eth2 10000 1520880644
1000000000 ch1 prod util001.ch1 eth0 10000 1520881650
1000000000 ch1 prod util002.ch1 eth0 10000 1520878215
1000000000 ch1 prod util003.ch1 eth0 10000 1520878185
1000000000 sv3 dev  mydb801.sv3 eth2 10000 1520881500
1000000000 sv3 prod cicd001.sv3 eth2 10000 1520881605
1000000000 sv3 prod cicd002.sv3 eth2 10000 1520881485

So now I have the beginnings of my TICK which gives me a stream of the all interfaces and their mean data rate over a 2 hour period

// net_io_percent

// DEFINE: kapacitor define net_io_percent -type batch -tick net_io_percent.tick -dbrp telegraf.2years
// ENABLE: kapacitor enable net_io_percent

// Parameters 
var warn = 0.75
var crit = 0.85
var period = 2h
var every = 2h

// Dataframe
var data_rate = batch
  |query('''
         SELECT non_negative_derivative(max("bytes_sent"), 1s) + non_negative_derivative(max("bytes_recv"), 1s) AS data_rate
         FROM "telegraf"."2years"."net"
         WHERE time < now()
         GROUP BY time(15s)
         ''')
    .period(period)
    .every(every)
    .groupBy(time(15s),'host','interface')
  |mean('data_rate')
    .as('data_rate')
  |groupBy('host','interface')

What I struggling with is how to extract the link speed from the measurement so that I can use that in an alert stanza that would look something like this

// Thresholds
var alert = data
  |alert()
    .id('{{ index .Tags "host"}}/{{ index .Tags "interface"}}/net_io_percent')
    .message('{{ .ID }}:{{ index .Fields "data_rate" }}')
    .warn(lambda: "data_rate" / "max" > warn)
    .crit(lambda: "data_rate" / "max" > crit)

I need to get “max” to be the vaule from the net_int_speed measurement that matches the host interface combo in the data stream

I have a feeling I may need to write a UDF for this but am hoping there may be a native way to do this with a node.property combo.

All and any help is much appreciated


#2

A better way to do this would be to use the sideload feature of Kapacitor.