Now, my kapacitor server is 30GB RAM (monitor about 200 host ). Mem-leak error lead to each 7 day it been restart. I’m try troubleshot log but found nothing.
this alert make mem leak, i doubt . can you take a look?
var db = 'telegraf'
var rp = 'autogen'
var measurement = 'system'
var groupBy = ['customer', 'environment', 'project', 'host']
var whereFilter = lambda: TRUE
var name = 'os_cpu_loadaverage_high'
var idVar = name + '-{{.Group}}'
var message = 'CPU Load average on host {{ index .Tags "host" }} is {{ index .Fields "value" | printf "%0.2f" }} %'
var idTag = 'alertID'
var levelTag = 'level'
var messageField = 'message'
var durationField = 'duration'
var triggerType = 'threshold'
var info = 75
var warn = 90
var data1 = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|window()
.period(1m)
.every(30s)
.align()
|mean('load5')
.as('mean_load5')
var data2 = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|window()
.period(1m)
.every(30s)
.align()
|mean('n_cpus')
.as('mean_cpus')
var data = data1
|join(data2)
.as('m1', 'm2')
|eval(lambda: "m1.mean_load5" / "m2.mean_cpus" * 100.0)
.as('value')
var trigger = data
|alert()
.warn(lambda: "value" < warn)
.info(lambda: "value" > info)
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.stateChangesOnly(30s)
.log('/tmp/alerts.log')
Currently, it trigger about 30 times per minute with my series data, (about 200 host) . but how it mem leak kapacitor?
AND, I look at the guide kapacitor and found in frequently-asked-questions :
So it say: I must fix my stream script by add window node in it? Do i get it right?