I am looking for a way to push alerts and their state back into Influx (different db). From this DB I would be able to see a timeline of the tasks alert states and build some dashboards around them.
I tried to do something as follows:
> data_combined
> |alert()
> .stateChangesOnly()
> .message(string(alert_header) + '
> Host \'{{ index .Tags "host" }}\'
> Normalized \'load5\' is at an average of {{ index .Fields "_value_normalized" | printf "%0.2f" }} over a window of ' +string(system_load5_window_period)+ '
> Expecting < {{ if eq .Level "CRITICAL" }}' + string(system_load5_critical_level) + '{{ else }}' + string(system_load5_warn_level) + '{{ end }}
> Raw load5 average is {{ index .Fields "_mean" | printf "%0.2f" }} over {{ index .Fields "_n_cpus"}} cpus
> <' + string(system_load5_graph_url) + '|Grafana>')
> .warn(lambda: "_value_normalized" > system_load5_warn_level)
> .crit(lambda: "_value_normalized" > system_load5_critical_level)
> .slack()
> .channel(slack_channel)
> |log()
> |influxDBOut()
> .database('ops_alerts')
> .retentionPolicy('default')
> .measurement('{{.TaskName}}')
> .tag('stack', stack)
> .tag('deployment', deployment)
> .tag('state', '{{.Level}}')
A few issues with my approach specifically around the usage of the influxDBOut() node…
- I would get all the data points the AlertNode has, which I don’t really care about. I think eval() could filter them out for me.
- The “{{.TaskName}}” for .measurement attribute is not expanded, it ends up in DB verbatim.
- This uses tags but I would prefer to a field, show the “state” i.e. current alert node level to be an integer, as this is easier to graph. I tried few things with lamba, but it looks like “.Level” is only present in the context of a .message
Anyone have any ideas on how to do this?
Thanks