Telegraf Starlark Processor - How to initialize a value from SNMP process table

xl3121 · March 10, 2021, 5:39pm

We use the telegraf SNMP plugin with the starlark processor plugin to check a process running state.

In the telegraf.conf, the [inputs.snmp] section has this measurement to query process names HOST-RESOURCES-MIB::hrSWRunName:
[[inputs.snmp.table]]
name = “hrSWRunTable_SIM”
inherit_tags = [ “hostname” ]
index_as_tag = true

 [[inputs.snmp.table.field]]
 name = "hrSWRunName"
 oid = ".1.3.6.1.2.1.25.4.2.1.2"  #  HOST-RESOURCES-MIB::hrSWRunName

We use the starlark processor to check a process running state based on the SNMP process table which has more than 100 processes under MIB HOST-RESOURCES-MIB::hrSWRunName. In this case, we check the crond process for its running state and ignore others. We have starlark processor plugin setup as below:
[[processors.starlark]]
namepass = [“hrSWRunTable_SIM”]
source = ‘’’
def apply(metric):
proc_name = metric.fields.get(‘hrSWRunName’)
if proc_name == “crond”:
metric.fields[‘crond’] = 2
print (proc_name)
return metric
else:
metric.fields[‘others’] = 0
return metric
return metric
‘’’
So if crond is running, the starlark script sets metric.fields[‘crond’] = 2, and we can see it on Chronograf Explore, and the Tick script configured with Kapacitor is able to verify it. However, if crond is not running, metric.fields[‘crond’] is not set or undefined(?) by starlark script, and in this case, the Chronograf Explore reported as no data (which is true) for crond state, and the Tick script is unable to verify the crond since there is no data or field “crond” is not defined for crond with the following codes:
var data = stream
|from()
.database(db)
.retentionPolicy(rp)
.measurement(measurement)
.groupBy(groupBy)
.where(whereFilter)
|eval(lambda: “crond”)
.as(‘value’)

var trigger = data
|alert()
// .crit(lambda: “value” == ‘null’)
.crit(lambda: “value” != 2)
// .stateChangesOnly()
.message(message)
.id(idVar)
.idTag(idTag)
.levelTag(levelTag)
.messageField(messageField)
.durationField(durationField)
.log(’/var/log/SAM_SNMP_Proc_crond.log’)

trigger
|eval(lambda: float(“value”))
.as(‘value’)
.keep()
|influxDBOut()
.create()
.database(outputDB)
.retentionPolicy(outputRP)
.measurement(outputMeasurement)
.tag(‘alertName’, name)
.tag(‘triggerType’, triggerType)

trigger
|httpOut(‘output’)

If we add “metric.fields[‘crond’] = 0” right after def apply(metric) in the starlark processor, then metric.fields[‘crond’] = 0 will be applied to all the processes in the process table and the Tick script will catch it and send out an alert for each process. How can we resolve this issue of crond state not defined when crond is not running and how to define it, or how does the Tick script handle it correctly if crond is not defined when it is not running? Thanks!

Franky1 · March 10, 2021, 8:33pm

Yes of course, because in your else path the field metric.fields['crond'] is not set:

else:
  metric.fields['others'] = 0

Btw, please post your Telegraf config snippets in Markdown format here in the future:

```toml
put the config code snippets here
```

xl3121 · March 10, 2021, 9:32pm

In my else path, I tried to set the field metric.fields[‘crond’] = 0, then this field metric.fields[‘crond’] = 0 was applied to other processes (over 100 of them). Then the Tick script checked the field “crond” for all these other processes and saw it was “0”, then generated an alert for each of these processes. This is not what I wanted to get. It should have just generated an alert for crond not running (not found in the process table). So my question is how/where to set the field metric.fields[‘crond’] to “0” for crond when crond is not running, not set field metric.fields[‘crond’] to “0” for other processes.

The following is the Telegraf config [[processors.starlark]]. Hope this does not change the indentation when it is posted:

[[processors.starlark]]
namepass = ["hrSWRunTable_SIM"]
source = '''
def apply(metric):
  #metric.fields['crond'] = 0
  proc_name = metric.fields.get('hrSWRunName')
  if proc_name == "crond":
    metric.fields['crond'] = 2
    print (proc_name)
    return metric
  else:
    metric.fields['others'] = 0
    return metric

  return metric
'''

Franky1 · March 10, 2021, 10:37pm

I am not sure if I have understood the problem correctly.
I am somewhat overwhelmed by the flood of information…
I’ll try to summarize the problem in one sentence:

You want to detect when crond is not in the list of processes

Is that the actual goal?

xl3121 · March 10, 2021, 10:55pm

Yes, it is the goal to detect when crond is not in the list of processes (process table). Now when crond is detected in the list of processes, metric.fields[‘crond’] = 2 in processors.starlark. The question is when crond is not detected in the list of processes (crond is not running), not sure how to set metric.fields[‘crond’] a value that can be written to Influx DB and used in the tick script/Kapacitor for crond alerting.

Franky1 · March 10, 2021, 11:16pm

Just an idea of how it might work.
Generate a new metric in the starlark processor:

[[processors.starlark]]
  source = '''
def apply(metric):
  proc_name = metric.fields.get("hrSWRunName")
  if proc_name == "crond":
    new_metric = Metric("crond")  # Create a new metric
    new_metric.fields["crond"] = 2  # add a field value
    new_metric.time = metric.time  # get original timestamp
    return [ metric, new_metric ]
  return metric
'''

In the evaluation you check if this metric has not come for more than X seconds/minutes…?

Franky1 · March 11, 2021, 1:11pm

I tried something else that might work for you.

Assuming that the snmp plugin always reads all processes in the same order and that there is one process (“lastProcess”) at the end of each cycle that always(!) exists, this might work.

[[processors.starlark]]
  namepass = ["hrSWRunTable_SIM"]
  source = '''
state = {
  "last": None
}

def apply(metric):
  proc_name = metric.fields.get("hrSWRunName")
  if proc_name == "crond":
    state["last"] = True
  elif proc_name == "lastProcess":
    new_metric = Metric("crond")  # Create a new metric
    crond_state = state.get("last")
    if crond_state != None:  # check if None
      new_metric.fields["alive"] = crond_state # add a field value
    else:
      new_metric.fields["alive"] = False # add a field value
    state["last"] = False  # reset state
    return [ metric, new_metric ]
  return metric
'''

xl3121 · March 11, 2021, 3:51pm

Hi Franky,

Thanks for the quick response with something for me to try!

The return of processes in the process table are not always in the same order every time when SNMP querying the process table. Would this be a problem to try the starlark codes you copied with this email?

I tried your codes, and I do not see any data for crond in Chronograf. I also do not see the new field “alive”. What needs to be done to get the data for crond state?

My Tick script for checking crond status is copied below. My starlark code sets the field “crond” to 2 if crond is running. It does not set the field “crond” if crond is not running, and that leaves the field “crond” unknown/undefined to me. I wanted to alert on crond not running state, but I do not know how to evaluate it in the tick script when the field “crond” is unknown/undefined. Can you help take a look at my tick script to give me suggestions?

Thanks for the help!

Xiaofeng Lin

------ tick script -------

var db = ‘sam_telegraf’

var rp = ‘default’

var measurement = ‘hrSWRunTable_SIM’

var groupBy =

var whereFilter = lambda: (“agent_host” == ‘32.68.15.138’)

var name = ‘SAM_SNMP_Proc_crond’

var idVar = name

var message = ’ {{.ID}} {{.Name}} {{.TaskName}} {{.Group}} {{.Tags}} {{ index .Tags “value” }} {{.Level}} {{.Fields}} {{ index .Fields “value” }} {{.Time}}’

var idTag = ‘alertID’

var levelTag = ‘level’

var messageField = ‘message’

var durationField = ‘duration’

var outputDB = ‘chronograf’

var outputRP = ‘autogen’

var outputMeasurement = ‘alerts’

var triggerType = ‘threshold’

var crit = 0

var data = stream

from()

.database(db)

.retentionPolicy(rp)

.measurement(measurement)

.groupBy(groupBy)

.where(whereFilter)

eval(lambda: “crond”)

.as(‘value’)

var trigger = data

alert()

// .crit(lambda: “value” == ‘null’)

.crit(lambda: “value” != 2)

// .stateChangesOnly()

.message(message)

.id(idVar)

.idTag(idTag)

.levelTag(levelTag)

.messageField(messageField)

.durationField(durationField)

.log(‘/var/log/SAM_SNMP_Proc_crond.log’)

trigger

eval(lambda: float(“value”))

.as(‘value’)

.keep()

influxDBOut()

.create()

.database(outputDB)

.retentionPolicy(outputRP)

.measurement(outputMeasurement)

.tag(‘alertName’, name)

.tag(‘triggerType’, triggerType)

trigger

Franky1 · March 11, 2021, 3:57pm

Yes, than this idea may not work.

The starlark script must of course be adapted to your data.
I don’t know your data that the snmp plugin spits out…
Therefore I can only give ideas.

xl3121 · March 11, 2021, 7:31pm

Franky,

With my starlark codes below, I can see the returned data for crond and other over 100 processes with crond=2i for crond when running “–test” on the Telegraf agent:

[[processors.starlark]]

namepass = [“hrSWRunTable_SIM”]

source = ‘’’

def apply(metric):

proc_name = metric.fields.get(‘hrSWRunName’)

if proc_name == “crond”:

metric.fields[‘crond’] = 2

print (proc_name)

return metric

else:

metric.fields[‘others’] = 0

return metric

‘’’

root@lionking93:/etc/telegraf# telegraf --config /etc/telegraf/telegraf.conf --test

2021-03-11T17:02:02Z I! Starting Telegraf 1.16.1

………

hrSWRunTable_SIM,agent_host=32.68.15.138,host=lionking93,index=965 hrSWRunName=“agetty”,others=0i 1615482126000000000

hrSWRunTable_SIM,agent_host=32.68.15.138,host=lionking93,index=13978 hrSWRunName=“bash”,others=0i 1615482126000000000

hrSWRunTable_SIM,agent_host=32.68.15.138,host=lionking93,index=5356 crond=2i,hrSWRunName=“crond” 1615482126000000000

hrSWRunTable_SIM,agent_host=32.68.15.138,host=lionking93,index=273 hrSWRunName=“xfs-buf/vda3”,others=0i 1615482126000000000

hrSWRunTable_SIM,agent_host=32.68.15.138,host=lionking93,index=1 hrSWRunName=“systemd”,others=0i 1615482126000000000

The current tick script handling the alert for crond not running is not working because there is no data point/result for the field ‘crond” when crond is not running. I do not know how to evaluate the field ‘crond’ when starlark did not set this field when crond not running. The tick script has the following codes to try to get the value for the field ‘crond’ and generate an alert. Any suggestion to evaluate the field ‘crond’ and generate an alert using this tick script code with any modification?

var data = stream

from()

.database(db)

.retentionPolicy(rp)

.measurement(measurement)

.groupBy(groupBy)

.where(whereFilter)

eval(lambda: “crond”)

.as(‘value’)

var trigger = data

alert()

// .crit(lambda: “value” == ‘null’)

.crit(lambda: “value” != 2)

// .stateChangesOnly()

.message(message)

.id(idVar)

.idTag(idTag)

.levelTag(levelTag)

.messageField(messageField)

.durationField(durationField)

.log(‘/var/log/SAM_SNMP_Proc_crond.log’)

Thanks,

Xiaofeng Lin

xl3121 · March 15, 2021, 2:31pm

I added the DefaultNode to set the ‘value’ (in this case, it is ‘crond’) to 0 in the Tickscript when it does not already exist. This solution works!

Topic		Replies	Views
Telegraf Starlark Processor - How to fetch value(s) from SNMP Table telegraf	7	1231	March 5, 2021
Telegraf [processors.starlark]: how to assign a process name (string) to the metric.fields Telegraf telegraf	6	1034	March 1, 2022
Telegraf starlark issues - multiple snmp table fields Telegraf snmp , starlark	8	331	December 19, 2023
Issue of running multiple Starlark plugins on a Telegraf agent Telegraf	6	633	February 27, 2023
Help starlark simple script Telegraf starlark	1	1508	September 17, 2021

Telegraf Starlark Processor - How to initialize a value from SNMP process table

Related topics