Notification rules for different checks

Hi there,

I’m following the doc here: Monitor data and send alerts | InfluxDB OSS 2.0 Documentation

Question for point number #3:

Since the statuses measurement contain a series of checks, which “check” is checked when the notification rule run? I don’t see a configuration to specify which check goes with which notification rule.

For example, how do I setup two pairs of checks/notifications: an hourly notification with an hourly check for CPU usage and another pair, maybe every two hours for MEM usage?

Thanks

Hello @Nam_Giang,
Sorry for the delay.
Have you seen this: InfluxDB’s Checks and Notifications System | InfluxData ?
I think it could answer a lot of questions you might have about InfluxDB’s checks and notifications system. If it’s lacking or confusing please let me know so I can edit it.

A check is just a specialized task under the hood.
The notification rule uses the monitor.from() function and filters for the check name you created to build and send notifications off of it.

statuses = monitor["from"](start: -10s, fn: (r) = r["_check_name"] == "CPU Check")

Thanks, I just want to confirm there’s no way to configure what I need using the Alerts page on influx right? I need to create a task for it manually, which is fine.

Hello @Nam_Giang,
If you’re just creating a threshold check on cpu usage every hour and mem usage every 2 hours and corresponding notification rules, you can absolutely create that through the UI.

Thanks Anaisdg,

Can you show me how? As I understand, I need two checks based on the CPU status and MEM status with different levels of criticality (CRIT, WARN, etc…). For example, if the CPU is > 50% set the status to be CRITICAL, if the MEM is > 70% set the status to be WARNING (hypothetically).

But the UI doesn’t allow me to distinguish the status when I create the notification rule, they all appear as just “status”. How do I distinguish which status level that a notification rule would stick to?

Hello @Nam_Giang,
Can you please take a moment to read through the InfluxDB Checks and Notifications System blog posted above, as I believe it answers all your questions?
TLDR:
A status is the output of a check. It’s time series data. A status contains the following schema:

  • _time : the time at which the check was executed.
  • _check_id : each check is assigned a check ID. Check ID’s can be used for debugging and rerunning failed checks — a tag.
  • _check_name : the name of your check (“CPU Check” for example) — a tag.
  • _level : either “CRIT”, “WARN”, “OK”, or “INFO” for UI checks (or configurable for custom checks) — tag
  • _source_measurement : the measurement of your source data (“cpu” for example) — a tag.
  • _measurement : the measurement of the status data (“status”) — a measurement.
  • _type : the type of check (“threshold” for example) — a tag.
  • _field : the field keys.
    • _message : the status message.
    • _source_timestamp : the timestamp of the source field that is being checked in ns precision
    • ${your source field} : the source field that is being checked.
    • dead : an additional field specific to deadman checks. A boolean that represents whether the deadman signal has met the deadman check conditions.
  • custom tags added to the query output during check configuration.
  • any additional tags included in the query, resulting from the v1.fieldsAsCols function. To view all of these tags, simply hover over a point in your query in the Data Explorer .

When you create a notification rule you can include the following message in your alert for example:

Notification Rule: ${ r._notification_rule_name } triggered by check: ${ r._check_name }: ${ r._message }

Which will help you identify which rule is alerting on which trigger. Of course you can add other information in your notification rule as described in the blog.

Thank you very much for your patience, please bear with me.

I totally understood now how Statuses are generated by Checks, I looked at the _monitoring bucket and I actually saw that each status was tagged with the _check_name so I’m ok with Checks.

My confusion is in the Notification Rules part.

I’m creating a notification rule and from the following screenshot, I don’t see how this rule can be bound to the CPU statuses that was generated by my CPU Check (i.e. the Statuses that were tagged with _check_name=“CPU check”):

In particular, how can I be sure that this rule will only evaluate the statuses with the tag _check_name = “CPU Check”?

Is there a way that I can see the generated flux code of the underlying task of such a notification rule?

Hi Anaisdg, I was hesitating, maybe I’m too dumb, but would you be able to help me one last time? Thanks

I think I figured it out. I think the Conditions line in the screenshot above should read:

When ANY status

Closing this issue!