Ratio, Time Value Divided by Maximum (Value)

Hi,

Pulling my hair out a bit here - close, but just can’t quite get it over the goal line :frowning_face:. I have a time series, with a tag (name) => I want to count the current different names, compare it to the “long term” maximum … so I know if data has gone missing recently (and raise an alert if needed). The two (working) queries are (ORDER and LIMIT are just to not have too much data here),

> select count("count") from (select count(perusage) from usage group by "name", time(20m)) group by time(20m) ORDER BY DESC LIMIT 10
name: usage
time                 count
----                 -----
2022-05-25T22:00:00Z 4
2022-05-25T21:40:00Z 4
2022-05-25T21:20:00Z 4
2022-05-25T21:00:00Z 4
2022-05-25T20:40:00Z 4
2022-05-25T20:20:00Z 4
2022-05-25T20:00:00Z 4
2022-05-25T19:40:00Z 2
2022-05-25T19:20:00Z 2
2022-05-25T19:00:00Z 2

Perfect! Exactly what I want. And the long term maximum value is,

> select max("count") from (select count("count") from (select count(perusage) from usage group by "name", time(20m)) group by time(20m))
name: usage
time                 max
----                 ---
2022-05-11T02:00:00Z 7

So, just looking to get the output of the first query (vs. time), divided by the result of the second (i.e. divide by 7). But it’s not working for me … perhaps because the number of elements in the two is not the same. Any thoughts?

Thanks!

InflxQL does not have JOINS, therefore you can’t achieve this kind of result.
In the Influx world that’s possible by using Flux (it can’t be compared to SQL, and I’m not knowledgable about it).

That said, there are several ways to build an alert given your conditions, but the best way depends on the tool available (ie: Grafana has an Alert functionality).
What kind of tool are you using if any?

That makes sense, thanks! I will alert using Grafana … there is a way using that tool?

This is gonna be kind of long, but I’m unable to find fewer words to explain it.

I suggest you look at the Grafana docs here as they are changing their alert functionality, have a look at them and decide which one to use.

I’ll put here a sample of what is now called the Legacy Grafana Alerts.

The first step is to decide how to check if something is wrong.
1- Finding the key → in your case “name”
2- Choosing a metric to identify when something is wrong → This is up to your single case
3- build a query with the desired precision
4- configure the alert rule

I’ll show you my case
1 - Key
My key is given by 3 tags (company, host, telegraf_instance)
2- Detection
I’ll use the data about telegraf internal monitoring to detect gathering issues (more info here)
3- Building the Query
the below query returns the number of points written by a telegraf agent

SELECT
	non_negative_difference(last("metrics_written")) AS "Points Written"
FROM "standard"."internal_write"
WHERE $timeFilter
GROUP BY
	time(1h)
	,"host"
	,"telegraf_instance"
	,"company"
fill(0)

When plotted the result is something like this (unreadable due to the number of agents, but that’s not an issue as it doesn’t have to be human-readable)

Notes:

  • I keep the Grafana time filter WHERE $timeFilter, the alert itself will manage it
  • I return 0 when there are no data - fill(0), as I like actually to see the “drop” in the chart

Practically when a host is not sending data I’ll have something like this:

4 - Alert Rule
for this one I suggest you look at the docs, if you go with the Legacy Alerting you should have something like this:

I’ll give a quick explanation

  • Evaluate Every - Alert rule running frequency
  • For - This sets an optional threshold, the alert will actually alert only after the alerting rule has been “firing” for this amount of time, below this threshold, the status will be “pending” (there are 3 statuses: OK, Pending, Alert)
  • Conditions - Choose Query and time range, then define an aggregation and compare the value to one of your choosing. Pay attention to the time range

The rule you see above does the following:

  • Given the number of points received by each agent, grouped by 1h
  1. Use 96h as time range time >= now()-96h → last 4 days of data
  2. Get the last value in the series
  3. compare it to 500, if lower trigger an alert
  4. This rule will run every 6h
  5. No alert will be triggered until the alert has been firing for more than 12h (it avoids alerting for temporary problems)

as you should know when you querying your data if a “key” 8tag) is not contained in the time range, it won’t appear at all, this is true for the alerting, if a series does not exist, you can’t alert on it.

ie: if you want to check every 10m,

  • do not get only the last 10m, because if a “key” is missing it won’t even appear and if it does not exists it can’t be compared to anything.
  • do get at least the last 20mins, this way the key will exist and in case of no value for the last 10mins its value will be null or 0 (if you use fill),

I suggest you use a way greater multiple of your alerting window, in my case I check every 6hrs, but get the latest 96hrs of data, meaning that my alert will keep firing for up to 16 executions before being “ok” again because the “key” gets completely out of range.

Hope this helps

1 Like

First off - thanks! Really appreciate you taking the time to capture this.

No worries at all - again, I know this took time to capture, sincerely appreciated!

I do have some Grafana alerts set up - Legacy type, like you note. I haven’t figured out the new approach yet … LOL. What you capture here makes sense, and matches to my “other” alerts. The problem in this particular case => I want the compare value to also come from a query. So the value of 500, in the example above … I want that to be dynamic, come from an query as well (that gives me the maximum value over time).

Make sense?

Thanks again.

makes sense but can’t be done as the threshold does not allow expressions.
You might be able to put it all together in a single “row” with the merge transformation, then perform the calculation with add field from calculation, but that does not solve the static threshold…

If you have well known “keys” and need different thresholds you can create an object for each “key” with different thresholds

Thanks! Let me check on that. Also wondering about using a Continuous Query, to generate a data “output” that stores this ratio as a function of time?

Thanks!