Kapacitor/morgoth alert on derivative being 0

#1

i’m pulling data into influx via sflow on an interface that is down; it’s dropping a bunch of 0s in the database. this is as designed. i’m using kapacitor and pulling in the data and calculating a derivative on the data to get a bits per second. this is, also as designed, being calculated to 0bps. morgoth is flagging that as an anomaly even though 100% of the datapoints are a 0. here is a copy of the tickscript and one of the offending log lines i get. am i doing something wrong? i’d have thought that with 100% of the datapoints falling in the exact same place that i’d be well within a sigma of 3


stream
|from()
.measurement(‘sflow’)
.groupBy(*)
|window()
.periodCount(60)
.everyCount(60)
|derivative(‘traffic_out’)
.unit(1s)
.nonNegative()
.as(‘dOut’)
@morgoth()
.field(‘dOut’)
.scoreField(‘anomalyScore’)
.sigma(3.0)
|alert()
.details(‘error’)
.crit(lambda: “anomalyScore” > 0.9)
.log(’/tmp/dev-morgoth-alerts.log’)

{“id”:“sflow:hostname=host,interface=Ethernet2/3”,“message”:“sflow:hostname=host,interface=Ethernet2/3 is CRITICAL”,“details”:“error”,“time”:“2017-07-25T06:28:20.234573182Z”,“duration”:0,“level”:“CRITICAL”,“data”:{“series”:[{“name”:“sflow”,“tags”:{“hostname”:“host”,“interface”:“Ethernet2/3”},“columns”:[“time”,“anomalyScore”,“dOut”,“traffic_in”,“traffic_out”],“values”:[[“2017-07-25T06:28:20.234573182Z”,0.95,0,0,0],[“2017-07-25T06:28:30.23478003Z”,0.95,0,0,0],[“2017-07-25T06:28:40.235267198Z”,0.95,0,0,0],[“2017-07-25T06:28:50.23514611Z”,0.95,0,0,0],[“2017-07-25T06:29:00.235319166Z”,0.95,0,0,0],[“2017-07-25T06:29:10.696931966Z”,0.95,0,0,0],[“2017-07-25T06:29:20.235758974Z”,0.95,0,0,0],[“2017-07-25T06:29:30.236343166Z”,0.95,0,0,0],[“2017-07-25T06:29:40.236131198Z”,0.95,0,0,0],[“2017-07-25T06:29:50.236312958Z”,0.95,0,0,0],[“2017-07-25T06:30:00.23649907Z”,0.95,0,0,0],[“2017-07-25T06:30:10.834432126Z”,0.95,0,0,0],[“2017-07-25T06:30:20.23741811Z”,0.95,0,0,0],[“2017-07-25T06:30:30.237005182Z”,0.95,0,0,0],[“2017-07-25T06:30:40.23740915Z”,0.95,0,0,0],[“2017-07-25T06:30:50.237414014Z”,0.95,0,0,0],[“2017-07-25T06:31:00.237575038Z”,0.95,0,0,0],[“2017-07-25T06:31:10.932032126Z”,0.95,0,0,0],[“2017-07-25T06:31:20.237975934Z”,0.95,0,0,0],[“2017-07-25T06:31:30.238144126Z”,0.95,0,0,0],[“2017-07-25T06:31:40.238301054Z”,0.95,0,0,0],[“2017-07-25T06:31:50.238853246Z”,0.95,0,0,0],[“2017-07-25T06:32:00.23873395Z”,0.95,0,0,0],[“2017-07-25T06:32:11.042032254Z”,0.95,0,0,0],[“2017-07-25T06:32:20.239256958Z”,0.95,0,0,0],[“2017-07-25T06:32:30.241026942Z”,0.95,0,0,0],[“2017-07-25T06:32:40.239454078Z”,0.95,0,0,0],[“2017-07-25T06:32:50.239624062Z”,0.95,0,0,0],[“2017-07-25T06:33:00.239965054Z”,0.95,0,0,0],[“2017-07-25T06:33:11.13663219Z”,0.95,0,0,0],[“2017-07-25T06:33:20.240210046Z”,0.95,0,0,0],[“2017-07-25T06:33:30.240670078Z”,0.95,0,0,0],[“2017-07-25T06:33:40.392131966Z”,0.95,0,0,0],[“2017-07-25T06:33:50.24073715Z”,0.95,0,0,0],[“2017-07-25T06:34:01.17103219Z”,0.95,0,0,0],[“2017-07-25T06:34:10.342572158Z”,0.95,0,0,0],[“2017-07-25T06:34:20.505032062Z”,0.95,0,0,0],[“2017-07-25T06:34:30.677032062Z”,0.95,0,0,0],[“2017-07-25T06:34:40.925032062Z”,0.95,0,0,0],[“2017-07-25T06:34:51.091031934Z”,0.95,0,0,0],[“2017-07-25T06:35:00.342132094Z”,0.95,0,0,0],[“2017-07-25T06:35:10.359462014Z”,0.95,0,0,0],[“2017-07-25T06:35:20.380931966Z”,0.95,0,0,0],[“2017-07-25T06:35:30.393932158Z”,0.95,0,0,0],[“2017-07-25T06:35:40.470132094Z”,0.95,0,0,0],[“2017-07-25T06:35:50.626131838Z”,0.95,0,0,0],[“2017-07-25T06:36:00.450132094Z”,0.95,0,0,0],[“2017-07-25T06:36:10.364552062Z”,0.95,0,0,0],[“2017-07-25T06:36:20.495332222Z”,0.95,0,0,0],[“2017-07-25T06:36:30.368862078Z”,0.95,0,0,0],[“2017-07-25T06:36:40.522031998Z”,0.95,0,0,0],[“2017-07-25T06:36:50.550132094Z”,0.95,0,0,0],[“2017-07-25T06:37:00.567031934Z”,0.95,0,0,0],[“2017-07-25T06:37:10.382642046Z”,0.95,0,0,0],[“2017-07-25T06:37:20.618231934Z”,0.95,0,0,0],[“2017-07-25T06:37:30.393982078Z”,0.95,0,0,0],[“2017-07-25T06:37:40.686031998Z”,0.95,0,0,0],[“2017-07-25T06:37:50.65843187Z”,0.95,0,0,0],[“2017-07-25T06:38:00.721332094Z”,0.95,0,0,0]]}]}}

#2

@fatpelt Morgoth isn’t comparing the window with itself but rather the windows that it has seen before.

So if Morgoth previously got a window where the link was up then the mean and stddev of that window would be non-zero.

Then when it sees this window where all the data is 0 it will see that as more than 3 sigma way from the previous non zero window and flag it as anomalous. But since Morgoth will learn what is normal as it see more windows as long as the zero windows are more common than your minSupport value (default 5%) then it will eventually learn that they are normal and not flag them as anomalous.

I started a PR a while back to give more insight into this behavior here https://github.com/nathanielc/morgoth/pull/46

I’ll see if I can wrap it up this weekend so you can try it out.

#3

That makes sense, yes, however this interface has been admin down for many moons. certainly longer than i’ve been running kapacitor. so the derivative has always been zero.

pfelt@influx:/tmp$ influx -database stats -execute "select max(traffic_out),min(traffic_out) from sflow where hostname = ‘host’ and interface = ‘Ethernet2/3’"
name: sflow
time max min


0 0 0

pfelt@influx:/tmp$

#4

@fatpelt Hmmm, maybe Morgoth has a divide by zero bug then. I’ll take a look.

#5

@fatpelt I found an issue with the sigma fingerprinter if all values are zero. Fix here https://github.com/nathanielc/morgoth/pull/52

#6

there must have been another commit between where i was and where i am now.

2017/07/28 01:43:09 E! Agent for connection 5 terminated with error: read error: failed to register metrics for group: “hostname=host,interface=Ethernet2/2”: window count metric: duplicate metrics collector registration attempted


i tried to change my group by to be from “hostname”,“interface” instead of * to see if that was the cause. sadly i installed via “go get” and because i’m new at go i have no idea where i was. any tips on good ways to help bisect this?


pfelt@influx:~/$ influx -database stats -execute "show series from sflow where hostname=‘host’ and interface=‘Ethernet2/2’"
key

sflow,hostname=host,interface=Ethernet2/2

pfelt@influx:~/$


#7

Use this commit 63efe5f8324c3ff17066867363c0431b9a08ecfb it has just the fix you need without any of the new metrics code.

#8

@nathaniel facing same issue. Let me know if you need more info.

morgoth bin (v0.3.1) downloaded from github.com

Error: morgoth3: failed to register metrics for group: "cid=2,cpu=cpu-total,host=p0-c2-xyz.com,region=us-west-2,role=proxy": window count metric: duplicate metrics collector registration attempted

#9

i found a lot of useful inforamtion for me here. thanks. i never though that forums can be this helpful. decided to search for info meanwhile i sit home searching for rx coupons because of my health issues and money issues. do you people mind if i am going to have some questions for you? i have some issues understanding some of the things. thanks a lot!

#10

@Bountiturill no feel free to ask. Also the master branch of Morgoth should have the issue fixed. I’ll probably cut a new release soon.

#11

Opened issue on github. Waiting for fix :slight_smile: