Alert Notifications for percentages based thresholds for warning at 80% and critical at 90% in InfluxDB

#1

Hi,

I am using telegraf agent and InfluxDB 1.2.2 version in my setup integrated with Grafana UI version 5.3.1

Currently i am planning to setup 80% warning and 90% critical thresholds of total Physical Memory which is 16 GB on all 10 remote nodes. More info:- http://docs.grafana.org/alerting/notifications/

For example if Physical Memory (RAM) on the remote nodes reaches 80% of 16Gb which is 13 GB then trigger an warning alert notification and if it crosses 15 GB memory then trigger a critical alert notification.

Does InfluxDb 1.2.2 support percentages based for 80 % warning and 90% critical thresholds in SQL format or do i need to upgrade it to 1.6 version?

More Info :- https://docs.influxdata.com/influxdb/v1.6/guides/calculating_percentages/

I look forward to hearing from you. Thanks in Advance.

Best Regards,

Kaushal

#2

Hi,

Checking in again if someone can pitch in for my earlier post to this forum.

Best Regards,

Kaushal

#3

Are these linux nodes? Telegraf collects memory stats, there is a measurement called ‘mem’. or [inputs.mem]

That has the field ‘used_percent’. You should be able to use that in Grafana to graph the percentages and build you’re alerts.

You mentioned using Grasfana for alerts, i don’t have it set up to create the query. to alert but…

If you were to use Kapacitor for your alerts then this script should get you started. You’ll need to add a warning variable with your 80% value and set it in your alert node.

var db = 'database'

var rp = 'retention_policy'

var measurement = 'mem'

var groupBy = ['host']

var whereFilter = lambda: TRUE

var name = 'nodeMemory'

var idVar = name + ':{{.Group}}'

var message = '{{.TaskName}} {{.Level}} {{ index .Fields "value" }} {{.Time}}'

var idTag = 'alertID'

var levelTag = 'level'

var messageField = 'message'

var durationField = 'duration'

var outputDB = 'chronograf'

var outputRP = 'autogen'

var outputMeasurement = 'alerts'

var triggerType = 'threshold'

var crit = 90

var data = stream
|from()
    .database(db)
    .retentionPolicy(rp)
    .measurement(measurement)
    .groupBy(groupBy)
    .where(whereFilter)
|eval(lambda: "used_percent")
    .as('value')

var trigger = data
|alert()
    .crit(lambda: "value" > crit)
    .stateChangesOnly()
    .message(message)
    .id(idVar)
    .idTag(idTag)
    .levelTag(levelTag)
    .messageField(messageField)
    .durationField(durationField)
    .log('/tmp/memory_log.txt')

trigger
|eval(lambda: float("value"))
    .as('value')
    .keep()
|influxDBOut()
    .create()
    .database(outputDB)
    .retentionPolicy(outputRP)
    .measurement(outputMeasurement)
    .tag('alertName', name)
    .tag('triggerType', triggerType)

trigger
|httpOut('output')

Hope that helps

#4

@philb Thanks a lot for the reply and i am currently setting it up. I will let you know if i get into issues.

#5

Glad to help, hope it works for you.