We’re currently testing Influx Cloud, but we’re unable to get alerts to work. No markers/thresholds are shown, and no alerts were generated when we went above the threshold. What’s wrong?
Hello @philipb,
Can you please try including an offset on your check?
Most of the times this is due to a read/write conflict. You want to give the task some time to query the data that might be arriving a little late before writing statuses to the _monitoring bucket.
You can also verify that underlying task of your check is running successfully, by copying the task id from the alerts page beside your check name and querying the “_task” system bucket for the execution status of the task run by filtering for the task with that task id and viewing the task meta data.
Finally, I encourage you to take a look at this blog that should help you understand everything about the InfluxDB checks and notifications system. If something needs more detail or you have any feedback, please let me know so I can edit/incorporate it.
Thank you!
Thanks. I managed to get the alert to work, finally. The problem was I hadn’t created a “Notification Rule”. This wasn’t very intuitive at all, and there was nothing indicating that I had to add a notification rule.
Still, I don’t understand the purpose of the marker toggles. I tested adjusting the offset value, but that didn’t help. There are still no “markers”.
Hello @philipb,
That sounds like a bug. There should be markers that indicate the value of your threshold for each level you set. Can you please submit an issue here:
I agree I think it would be better if there was numbering to indicate that you must complete all three steps, 1. configure check
2. configure notification endpoint
3. configure notification rule
In order to send an alert.
I created this feature request based on your experience. Feel free to comment Add numbering or arrows on alerts page to guide the user to complete all 3 configurations · Issue #1505 · influxdata/ui · GitHub
@Anaisdg So I’m still having issues with alerts not working. After looking at the “View History” view I noticed that I see a lot of “level: unknown”. Any idea what would cause this?
If it helps. I initially defined thresholds for every level (ok, info, warn, crit). To test I’m no only using ok and crit. I’ll have to see if that helps.
I’m not getting status unknown now. But still no alerts…
No alerts received.
Also there seems to be a long delay before the level changes. It took 2 minutes before the level changed to crit. And it took about 10 minutes for the level to go back to ok. Is that normal?
This is really frustrating and is definitely starting to make us consider moving to a different solution, rather than deciding to pay for Influx Cloud.
Hello @philipb,
The delay between level changes just depends on your data and where you’ve defined the levels and also the period that you execute your check. Same with whether level goes back to ok.
If your check is running every 10s and your data exceeds the crit threshold then your level should be crit 10s after. If your data then falls below the crit threshold 5 min later, then your level should be ok or unknown at 5 min + 10s.
If you’re getting a status of unknown it’s because you haven’t defined what level to assign at that value.
Please take a look a this issue:
Specifically this response:
if i set CRIT > 80 and OK < 40, what level do i get for a value of 50?
You would getunknown
, if you getok
in that case then its definitely a bug (but a different bug).
If you setCRIT > 80
but do not set anOK
condition what level should you get for a 50?
If we say it should be anunknown
then you would not get recoveries when the value drops below 80 and users would have to define bothCRIT > 80
andOK <=80
in order to get recoveries.
If we say it should beok
then you would get recoveries when the value drops below 80 without requiring the user to explicitly state anok
condition.
If we say thatunknowns
trigger recoveries thenok
andunknown
are the exact same thing and there is no purpose in having both.
You can change your notification rule so that when the status changes from any to crit you get a notification. This way when your data goes from unknown to crit you’ll get an alert. I’m guessing you were expecting some notifications with that behavior and it’s contributing to your perceived delay.
Does that help?
Also, out of curiosity what is your use case?