I’m trying to alert when I receive nginx errors. I have the following script, which works except the alert never resets when the errors drop to zero.
var data = batch
|query('SELECT * FROM "telegraf"."autogen"."nginx_access_log"')
.period(5m)
.every(5m)
.groupBy(*)
|where(isPresent("@fields_request") AND int("@fields_status") > 499)
|eval(lambda: "@fields_request").as('request').tags('request').keep()
|eval(lambda: "@fields_status").as('status').tags('status').keep()
|groupBy(*)
|count('status').as('status_count')
var alert = data
|alert()
.message("{{ index .Tags "host" }} is {{ .Level }}: ({{ index .Fields "status_count" }}) HTTP {{ index .Tags "status" }} {{ index .Tags "request" }}")
.warn("status_count" > 0)
I understand that count
won’t return zero if no data is received. I also tried a suggested reverse deadman script but that dropped all the tags I was using in my message and always produces a “CRITICAL” level rather than a warning.
Anyone have any suggestions?