Alerts not working

philipb · May 22, 2021, 4:03am

We’re currently testing Influx Cloud, but we’re unable to get alerts to work. No markers/thresholds are shown, and no alerts were generated when we went above the threshold. What’s wrong?

Anaisdg · May 24, 2021, 7:38pm

Hello @philipb,
Can you please try including an offset on your check?
Most of the times this is due to a read/write conflict. You want to give the task some time to query the data that might be arriving a little late before writing statuses to the _monitoring bucket.

You can also verify that underlying task of your check is running successfully, by copying the task id from the alerts page beside your check name and querying the “_task” system bucket for the execution status of the task run by filtering for the task with that task id and viewing the task meta data.

Finally, I encourage you to take a look at this blog that should help you understand everything about the InfluxDB checks and notifications system. If something needs more detail or you have any feedback, please let me know so I can edit/incorporate it.

Thank you!

philipb · May 25, 2021, 3:12am

Thanks. I managed to get the alert to work, finally. The problem was I hadn’t created a “Notification Rule”. This wasn’t very intuitive at all, and there was nothing indicating that I had to add a notification rule.

Still, I don’t understand the purpose of the marker toggles. I tested adjusting the offset value, but that didn’t help. There are still no “markers”.

Anaisdg · May 25, 2021, 4:51pm

Hello @philipb,
That sounds like a bug. There should be markers that indicate the value of your threshold for each level you set. Can you please submit an issue here:

I agree I think it would be better if there was numbering to indicate that you must complete all three steps, 1. configure check
2. configure notification endpoint
3. configure notification rule
In order to send an alert.

Anaisdg · May 25, 2021, 5:00pm

I created this feature request based on your experience. Feel free to comment Add numbering or arrows on alerts page to guide the user to complete all 3 configurations · Issue #1505 · influxdata/ui · GitHub

philipb · May 30, 2021, 6:30am

@Anaisdg So I’m still having issues with alerts not working. After looking at the “View History” view I noticed that I see a lot of “level: unknown”. Any idea what would cause this?

If it helps. I initially defined thresholds for every level (ok, info, warn, crit). To test I’m no only using ok and crit. I’ll have to see if that helps.

philipb · May 30, 2021, 6:47am

I’m not getting status unknown now. But still no alerts…

No alerts received.

Also there seems to be a long delay before the level changes. It took 2 minutes before the level changed to crit. And it took about 10 minutes for the level to go back to ok. Is that normal?

This is really frustrating and is definitely starting to make us consider moving to a different solution, rather than deciding to pay for Influx Cloud.

philipb · May 30, 2021, 6:47am

Another screenshot (since I’m only allowed to embed 1)

Anaisdg · June 1, 2021, 3:19pm

Hello @philipb,
The delay between level changes just depends on your data and where you’ve defined the levels and also the period that you execute your check. Same with whether level goes back to ok.
If your check is running every 10s and your data exceeds the crit threshold then your level should be crit 10s after. If your data then falls below the crit threshold 5 min later, then your level should be ok or unknown at 5 min + 10s.
If you’re getting a status of unknown it’s because you haven’t defined what level to assign at that value.
Please take a look a this issue:

github.com/influxdata/influxdb

Check status defaults to `ok` instead of `unknown`

opened 08:59PM - 29 Oct 19 UTC

closed 07:46PM - 12 Nov 19 UTC

ebb-tide

A threshold check with a single threshold defined in the UI as: ``` {level: "C…RIT", value: 80, type: "greater"} ``` creates a check which produces a task: ``` package main import "influxdata/influxdb/monitor" import "influxdata/influxdb/v1" data = from(bucket: "data") |> range(start: -15s) |> filter(fn: (r) =>(r._measurement == "cpu")) |> filter(fn: (r) =>(r._field == "usage_user")) |> filter(fn: (r) =>(r.cpu == "cpu0")) |> aggregateWindow(every: 15s, fn: mean, createEmpty: false) option task = {name: "Name this Check", every: 15s, offset: 0s} check = { _check_id: "04b24ecaa80fd000", _check_name: "Name this Check", _type: "threshold", tags: {}, } crit = (r) =>(r.usage_user > 80.0) messageFn = (r) => ("Check: ${ r._check_name } is: ${ r._level }") data |> v1.fieldsAsCols() |> monitor.check(data: check, messageFn: messageFn, crit: crit) ``` should produce a status with level `crit` if value is greater than 80, and `unknown` if value is less than 80. Currently produces status with level `ok` if value is less than 80 ![Screen Shot 2019-10-29 at 3 11 48 PM](https://user-images.githubusercontent.com/10937678/67813316-7cd95b00-fa5e-11e9-9f3a-a060a0f22e1e.png)

Specifically this response:

if i set CRIT > 80 and OK < 40, what level do i get for a value of 50?
You would get unknown , if you get ok in that case then its definitely a bug (but a different bug).
If you set CRIT > 80 but do not set an OK condition what level should you get for a 50?
If we say it should be an unknown then you would not get recoveries when the value drops below 80 and users would have to define both CRIT > 80 and OK <=80 in order to get recoveries.
If we say it should be ok then you would get recoveries when the value drops below 80 without requiring the user to explicitly state an ok condition.
If we say that unknowns trigger recoveries then ok and unknown are the exact same thing and there is no purpose in having both.

You can change your notification rule so that when the status changes from any to crit you get a notification. This way when your data goes from unknown to crit you’ll get an alert. I’m guessing you were expecting some notifications with that behavior and it’s contributing to your perceived delay.

Does that help?
Also, out of curiosity what is your use case?

Topic		Replies	Views
Alert checks not working InfluxDB 2 influxdb , telegraf , docker	1	867	May 24, 2021
InfluxDB alerts and notifications Checks & Notifications influxdb	0	383	May 19, 2023
Checks not working with Daily Trigger Checks & Notifications influxdb , checks	4	797	April 27, 2022
Alerting in InfluxDB 2.0 using Flux InfluxDB 2	4	595	May 26, 2021
Monitor's check ignored and notifying all values. TASKS Feature Checks & Notifications influxdb , checks , tasks	1	430	June 27, 2022

Alerts not working

Related topics