I’m writing a device’s SMART alert. Telegraf plugin gather data, but if device is heavily damaged, it may cause plugin not to gather all needed values (fields).
I have a set of expressions to check all those conditions (unable to gather smart, smart is not enabled, check if metrics are ok).
Every time I have a mangled data (due to failing disks) I have ‘very bad’ alert (as planned), but this cause other expressions to write an error into kapacitor log because there is no fields to process.
My code in question:
var raw_data = stream
|from()
.measurement('smart_device' )
var rw_errors = raw_data|from()
.where(lambda: "read_error_rate" != 0 OR "write_error_rate" !=0)
var smart_enabled = raw_data|from()
.where(lambda: "enabled" != 'Enabled')
var health_not_ok = raw_data|from()
.where(lambda: "health_ok" == FALSE)
var exit_status = raw_data|from()
.where(lambda: "exit_status" != 0)
rw_errors|alert()
.crit(lambda: "read_error_rate" > 0)
.id('read_error_rate')
.message('Read error rate for for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }} is non-zero ({{ index .Fields "read_error_rate" }})')
.topic('<< kapacitor_device_smart_topic >>')
rw_errors|alert()
.crit(lambda: "write_error_rate" > 0)
.id('write_error_rate')
.topic('<< kapacitor_device_smart_topic >>')
.message('Write error rate for for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }} is non-zero ({{ index .Fields "write_error_rate" }})')
health_not_ok|alert()
.crit(lambda: "health_ok" != FALSE)
.id('drive_health_status')
.message('/dev/{{ index .Tags "device" }} at {{ index .Tags "host" }} is failing!')
.topic('<< kapacitor_device_smart_topic >>')
exit_status|alert()
.crit(lambda: "exit_status" != 0)
.id('smartctl_exit_status')
.message('Smartctl return non-zero exit code ({{index .Fields "exit_status"}}) for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }}')
.topic('<< kapacitor_device_smart_topic >>')
smart_enabled|alert()
.crit(lambda: "enabled" != 'Enabled')
.id('no_smart_for_device')
.message('Unable gather SMART for /dev/{{ index .Tags "device" }} at {{ index .Tags "host" }}')
.topic('<< kapacitor_device_smart_topic >>')
If record have no read_error_rate
, there is an kapacitor error:
ts=2018-08-14T14:14:20.716Z lvl=error msg="failed to evaluate WHERE expression" service=kapacitor task_master=main task=device_smart node=alert6 err="left reference value \"read_error_rate\" is missing value"
Is any way to say kapacitor ‘stop processing other expressions’? Like ‘return’, or ‘stop’, or something like that…
Or, in another way: is there a way to check if some field is present or not?
Thanks!