Stop Alerting for datapoints which will not send again

Dear Influx Community,

this is my first post here.
I am reaching out to you with certain problem:

On my InfluxDB I have an Alerting System if specific datapoints are sending only “0” for a certain time an alert is generated.
It can happen that a datapoint is shut down and will not send values greater than “0” again. My Alerting System will forever believe that it has to send alerts.

Do you have any idea how to tackle this problem?

Best
Felix

Is it not possible to remove the endpoint from your alert script?

If you’re not specifying an endpoint and are just monitoring them all with one alert you could use the .stateChangesOnly() node. That will only generate a new alert when the given state changes. If the end point is never coming back online, it will be in a constant “triggered” state but send out no more alerts for that endpoint. If the endpoint came back online in the future, you would just get an “OK” message unless the endpoint went offline again.

state change documents are here

If you could remove the endpoint from your alert script completely though, thats probably better.

Hi Philb,
thanks for your fast response!
How is it possible to remove an endpoint from the alert script? Can you go in detail?

For more information:

I have a Measurement, with a tag. Every element within this tag is a station. Every sation is sending several values, which are in my fields section. Now a station stops existing, but i cannot remove it from my measurement because i would lose historical data. So how can i exclude this station from the alert script.

Hope this helps
Best
Felix

Hi @fe11x , no problem. sorry for the delay.

It depends on your script, assuming that you used Chronograf to generate the TICK script you might have selected the endpoints you want to monitor. That would add a part to your script something like

var whereFilter = lambda: ("endpoint" == 'endpoint1' OR "endpoint" == 'endpoint2')

If you click to edit the tick script, you pick the measurement and you can see the available tags in there. Selecting the values in “endpoint” or “station” tag will add them to that filter above ^^^^, to remove the endpoint you just need to unselect it.

Sorry i don’t have access to any test data at the moment to post a screenshot.

That works ok if you know when an endpoint will come online and you plan to add it. However if you don’t when this will happen it can be tedious to manage and eventually cumbersome if you have endpoints coming on at random intervals.

If you don’t select endpoints, you’ll get a value like this for the whereFilter

var whereFilter = lambda: TRUE

That will make your script process against all endpoints that appear in the measurement. The issue there is that as you’ve seen, once the endpoint goes offline it would keep sending out the alerts. In this case, i’d use the stateChangesOnly() node mentioned in my previous post to stop the alerts.

Then, you would only receive an alert if the endpoint came back online - You would get an OK message, then if the endpoint disappeared again you would get another alert.

Thank you so much! Thats actually an easier solution than I expected!

one last question.
should it also work the other way round?

like this:

var whereFilter = lambda: (“endpoint” != ‘endpoint1’ AND “endpoint” != ‘endpoint2’)

Because I have a lot of endpoints and I want to exclude just a few of them.

BR Felix

No problem @fe11x

That should work if you only want to exlude a few of them. The script should process all data points coming in, other than the the ones you exclude. So endpoints 3,4,5,6 and so on should still alert. Obviously the more endpoints you want to exlude the more you would add in the filter. It could become cumbersome after a while.

Glad to help

Phil

Thanks for the info philb.
I tested that now. Seems that there is a bug in chronograf. When using “!=” in the alert script, it only accepts OR not AND.
And as obvious var whereFilter = lambda: (“endpoint” != ‘endpoint1’ OR “endpoint” != ‘endpoint2’) doesn’t make any sense. In the negation I would need AND.

But I also can use “==” instead of “!=”. I just have to tick 200 tags :wink:

Would be kind if you can check if you would find the same bug?

BR Felix

Hey Felix, I can’t say that i have encountered that, it may be a bug though.

To be honest, i use Chronograf to generate the base script then copy it into an IDE like VS Code. I find it works better and you can easily use the available Kapacitor nodes.

|stateDuration and |stateCount are useful nodes if you want to trigger the alert after a set period, but i don’t think Chronograf will accept these nodes in the tick script. It never used to possible.

After generating the script in Chronograf and copy the contents, i disable and delete the task. Edit my TICK script as needed then upload it to my server.

From there you can define your alerts using the Kapacitor CLI.

Assuming you’re using Linux, it would be something like

sudo kapacitor define TaskName -type stream -tick /path_to_tick_file -dbrp DatabaseName.RetentionPolicy

Then to enable it

sudo kapacitor enable TaskName

You should (can, i do it myself) be able to use !=, you can also use regex in the filter =~

I’m not sure if using AND will work, it will be expecting a data point where the tag value is equal to both. Give it a try though.

Kapacitor does support a side load function, you might be able to leverage that to provide a list of end points you want to monitor.
Admitedly it isn’t ideal when you have 200 endpoints to select.

Another option could be to use Kapacitor templates

You would still need to specify the endpoints in a variable file though.