We are trying to configure alert using Kapacitor.
Kapacitor alert scipt
var warn = 25
var crit = 50
var period = 60s
var every = 10s
// Dataframe
var data = batch
|query(''' SELECT used_percent AS stat FROM "infra-monitoring-production"."autogen"."disk" where ( host =~ /postgres/ OR host=~ /-db-/ OR host=~ /-pg-/) AND path=~ /\/var\/lib\/postgresql/ ''')
.period(period)
.every(every)
.groupBy('host')
// Thresholds
var alert = data
|alert()
.id('{{ index .Tags "host"}}/disk_total')
.message('{{ .ID }}:{{ index .Fields "stat" }}')
.warn(lambda: "stat" >= warn)
.crit(lambda: "stat" >= crit)
// Alert
alert
.post('http://some-webhook-url')
After defining and enabling the alert, we are getting below error from Kapacitor log.
ts=2019-01-08T19:59:10.007+07:00 lvl=debug msg="alert triggered" service=kapacitor task_master=main task=foobar1-alert node=alert5 level=CRITICAL id="node 'batch0' in task 'foobar1-alert'" event_message="node 'batch0' in task 'foobar1-alert' is dead: %!f(MISSING) points/10s." data="&{stats map[] [time emitted] [[2019-01-08 12:59:10 +0000 UTC 0]]}"
Although when we are checking data in Influxdb, data is present with correct measurement and field value.
> select * from "disk" where host='p-foobar-postgres-a-01' order by time DESC limit 10
name: disk
time device free fstype host inodes_free inodes_total inodes_used mode path total used used_percent
---- ------ ---- ------ ---- ----------- ------------ ----------- ---- ---- ----- ---- ------------
1546952590000000000 sdc 3415011328 ext4 p-foobar-postgres-a-01 655347 655360 13 /var/lib/postgresql 10434699264 6466039808 65.43878499365353
1546952590000000000 sda1 4035452928 ext4 p-foobar-postgres-a-01 1177494 1280000 102506 / 10340831232 6288601088 60.91212888129082
Strange things is that same Kapaitor script is working for other measurement and field value like CPU or mem, but not working for disk.
Any help from the community.?