InfluxDB Checks and Notifications Best Practices

pgoh · May 27, 2022, 3:27am

Hi. For the last couple of months, I’ve been using InfluxDB OSS to work on a proof of concept / prototype and starting to work towards a solution design. We are using OSS as on-prem is a requirement for our use-case.

Background:
Solution will have a service to collect sensor metrics which have been configured by end-users and write to InfluxDB. End users would define their thresholds via an application front-end, with rules stored in a separate SQL database. The application service would create the corresponding Check in InfluxDB using the Checks API (https://github.com/influxdata/influxdb-client-csharp/tree/master/Client#monitoring–alerting). System Admins and devs would have access to Influx UI, but otherwise a user would interact with the system through a separate application.

I have read:
https://www.influxdata.com/blog/influxdbs-checks-and-notifications-system/
and watched Checks and Notifications in Action - YouTube

which have been useful for overall understanding how checks and how alerts/checks could be used in a real world scenario, however I hope community will help with these questions:

Implementation - for a dynamic number of metrics and threshold rules, would we define one alert for each rule, or is there a pattern for running checks against a set of different metrics and thresholds within a single alert task?
Scale / Performance - at a rough estimate there could be in the order of 50 checks running at different frequencies and thresholds. Are there any limits on number of tasks / checks before we might run into performance issues?
Retention Policy - we would like to be able to examine history of threshold checks, but these are stored in _monitoring bucket which has 7 day retention policy. Is there a way this could be extended, or should we follow the practice of writing the data we need from _monitoring _statuses measurement to a bucket with longer retention policy.

I look forward to advice from those more experienced, and I hope I will be able to contribute to the community in the future.

Bigman74066 · July 26, 2022, 8:13am

Hi @pgoh,

No answer unfortunately. But I’m basically trying to do the same thing.

uiding alerts in the InfluxDB UI.is too complicated for end-users.
My solution will need dynamic thesholds too.
I posted a question on stack-overflow " InfluxDB checks with dynamic thresholds"

Did you make any progress?

Anaisdg · August 2, 2022, 7:10pm

Hello @pgoh,

You could create invokable script with parameters.
Call that script with http package in flux instead of cURL.
TL;DR InfluxDB Tech Tips: API Invokable Scripts in InfluxDB Cloud | InfluxData
http.post() function | Flux 0.x Documentation

You don’t need to create a check and notification rule. You can do these simultaneously. See this:

and this:

So essentially you could create invokable scripts out of the two resources directly above to check and send a notification with your data BUT you’d parameterize the scrips and create an invokable script. Then you would create tasks where you pass in different filters and thresholds into your invokable script and invoke the script with the http.post function in a task. Does that make sense? If not, where can I help?

Hmm I’m not sure. Are you using InfluxDB Cloud or OSS? If it’s OSS what HW? For cloud at least, we have customers with 10,000’s of tasks, the problem is usually they run into timeout limits for task execution, so somewhat short tasks you can run a ton of.
Yes you could include logic in the scripts above to write whatever metadata you want about your check to another bucket.

Here’s an example of calling an invokable script in a task:

github.com

InfluxCommunity/late_data/blob/ecb7fd03dc3c7459add7f2438c589419f1b2616e/water_level_checksum.flux#L19


      
          // Size of the window to aggregate
          every = task.every
          
          
// Longest we are willing to wait for late data
          late_window = 1h
          
          
token = secrets.get(key: "SELF_TOKEN")
          
          
// invokeScript calls a Flux script with the given start stop
          // parameters to recompute the window.
          invokeScript = (start, stop) =>
              requests.post(
                  // We have hardcoded the script ID here
                  url: "https://eastus-1.azure.cloud2.influxdata.com/api/v2/scripts/095fabd404108000/invoke",
                  headers: ["Authorization": "Token ${token}", "Accept": "application/json", "Content-Type": "application/json"],
                  body: json.encode(v: {params: {start: string(v: start), stop: string(v: stop)}}),
              )
          
          
// Only query windows that span a full minute
          start = date.truncate(t: -late_window, unit: every)
          stop = date.truncate(t: now(), unit: every)

Where this is the invokable script

github.com

InfluxCommunity/late_data/blob/main/water_level_process.flux

// Compute the mean for the window
from(bucket: "water_level_raw")
    |> range(start: params.start, stop: params.stop)
    |> mean()
    |> to(bucket: "water_level_mean_1h", timeColumn: "_stop")
    |> yield(name: "means")


// Compute and store new checksum for this window
from(bucket: "water_level_raw")
    |> range(start: params.start, stop: params.stop)
    |> group(columns: ["_measurement", "_field", "_stop"])
    |> count()
    |> to(bucket: "water_level_checksum", timeColumn: "_stop")
    |> yield(name: "checksums")

Topic		Replies	Views
Best monitoring strategy for production InfluxDB 2 tasks	3	941	March 26, 2021
Influx Alert multiple fields configuration Checks & Notifications influxdb	6	72	August 21, 2024
Evaluation of thousands metrics InfluxDB 2 influxdb , performance , docker	14	1068	June 14, 2021
Alerting in InfluxDB 2.0 using Flux InfluxDB 2	4	595	May 26, 2021
InfluxDB alerts and notifications Checks & Notifications influxdb	0	383	May 19, 2023

InfluxDB Checks and Notifications Best Practices

Related topics