Create alert to check number of running instances are not less than 3

Hello, I have the following data coming into influx and I want to have an alert notification in case number of instances that are up and running are less than 3.

_measurement _field _value cnpg.io/instanceName
cnpg_collector_up gauge 0 or 1 (0: down, 1 is up) Instance-1, …, instance-n

For this I thought about using a threshold check that sums up the instances that are up in the last minute and then send warning notification if less than 3. But I am having hard time writing the query for the threshold check. The UI is quite limiting (does not let me use sum or group operations for example) and when I do it via javascript API the check fails without giving much info. (It just say Last Run Status: Completed(failed) in the UI without any details).

The only error in the logs are:
ts=2023-04-28T14:33:15.017598Z lvl=info msg=“Error exhausting result iterator” log_id=0hP2dMN0000 service=task-executor error=“unknown column "_source_measurement"” name=wide-to19
ts=2023-04-28T14:33:15.022706Z lvl=debug msg=“Execution failed” log_id=0hP2dMN0000 service=task-executor error=“could not execute task run: unknown column "_source_measurement"” taskID=0b1e0c524ee96000

And the query I wrote to get number of instances that are up:

from(bucket: "telegraf")
                |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
                |> filter(fn: (r) => r["_measurement"] == "cnpg_collector_up")
                |> filter(fn: (r) => r["_field"] == "gauge")
                |> group(columns: ["cnpg.io/instanceName", "_field", "_measurement"], mode:"by")
                |> aggregateWindow(every: 1m, fn: mean)
                |> drop(columns: ["cnpg.io/instanceName"])
                |> group(columns: ["_time", "_field", "_measurement"])
                |> sum()
                |> group()

So I wonder if this is a right approach and what is failing here? Checks do not support more complex queries?

Hello @dogan,
Did you create a seprate threshold check through the UI? and then query that?
Id recommend creating one task that does it all.
Something like:

import "array"
import "slack"

option task = { name: "Alert on instances", every: 1h0m0s, offset: 5m0s }

alert = (eventValue, threshold) =>
   (if eventValue >= threshold then slack.message(
       url: "https://hooks.slack.com/services/####/####/####",
       text: "An alert event has occurred! The number of field values= \"${string(v: eventValue)}\".",
       color: "warning",
   ) else 0)

data = from(bucket: "telegraf")
                |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
                |> filter(fn: (r) => r["_measurement"] == "cnpg_collector_up")
                |> filter(fn: (r) => r["_field"] == "gauge")
                |> group(columns: ["cnpg.io/instanceName", "_field", "_measurement"], mode:"by")
                |> aggregateWindow(every: 1m, fn: mean)
                |> drop(columns: ["cnpg.io/instanceName"])
                |> group(columns: ["_time", "_field", "_measurement"])
                |> sum()
                |> group()

data_0 = array.from(rows: [{_value: 0}])
events = union(tables: [data_0, data])
   |> group()
   |> sum()
   |> findRecord(fn: (key) =>
       (true), idx: 0)
eventTotal = events._value

data_0
   |> yield(name: "ignore")
alert(eventValue: eventTotal, threshold: 3)

It’s taken from

Hi Anaisdg,
Thanks for the informative answer.

I have created a threshold check via UI but it did not let me write the query. Then created another one programmatically using ChecksAPI from @influxdata/influxdb-client-apis and that one fails with the generic error message (shown in UI).

I am a bit confused with your recommendation here. Do you recommend implementing own alerting function instead of using NotificationEndpointsAPI and NotificationRulesAPI? Is there an example of how to send an HTTP alert from a task run in the same format of NotificationEndpointsAPI?