Issue of running multiple Starlark plugins on a Telegraf agent

I use two Starlark plugins on a Telegraf agent. The 1st Starlark plugin is used to monitor cron/crond on different Linux flavors (e.g., RHEL, CentOS, Ubuntu), the 2nd Startlark plugin is used to just show/determine the Linux flavors (“sysDescr”) to test the multiple Starlark functions. However, when both these two Starlark plugins are “presented/enabled”, only the 2nd Starlark plugin has/shows the data (print “sysDescr”), and the 1st Starlark plugin (for cron/crond) does not have any data (like it does not exist, while it should show (print) the process cron, crond, sshd, and ntpd, as coded). If I remove the 2nd Starlark plugin from the Telegraf agent config file, the 1st Starlark plugin (cron/crond) works and has data as expected.

On the InfluxData web site How to Use Starlark with Telegraf | InfluxData, it indicates that multiple Starlark functions/plugins can be running on a Telegraf agent configuration file.

Any suggestions/comments on how to get this multiple Starlark plugins working as designed? The related codes/lines in the Telegraf agent configuration file are copied below:

[[inputs.snmp]]
...... (skip other codes under [inputs.snmp] here) .......

    name = "snmp"
    [[inputs.snmp.table.field]]
      name = "hrSWRunName"
      oid = ".1.3.6.1.2.1.25.4.2.1.2"

[[processors.starlark]]
namepass = ["snmp"]
source = '''
def apply(metric):
  linux_os = metric.fields.get('sysDescr')
  proc_name = metric.fields.get('hrSWRunName')
  if proc_name in [ "cron", "crond", "ntpd", "sshd" ]:
       print (proc_name)
       if proc_name == "cron":
          metric.fields["cron"] = 1
          print ("cron")
          return metric
       elif proc_name == "crond":
          metric.fields["cron"] = 1
          print ("cron")
          return metric
       else:
          metric.fields[proc_name] = 1
          print (proc_name)
          return metric
  elif proc_name == None:
    return metric
  return None
'''

[[processors.starlark]]
namepass = ["snmp"]
source = '''
def apply(metric):
  linuxos = metric.fields.get('sysDescr')
  if linuxos != None:
     if linuxos.find("el7") != -1:
        print ("linuxos el7:", linuxos)
        return metric
     elif linuxos.find("el8") != -1:
        print ("linuxos el8:", linuxos)
        return metric
     elif linuxos.find("Ubuntu") != -1:
        print ("linuxos Ubuntu:", linuxos)
        return metric
  else:
    return None
'''

it should be possible to run any amount of instances of a processor, if order matter you can add order = N at the beginning of the processor.

I think you may be dropping metrics… return None effectively drops the metrics that go through the plugin. The second starlark processor does just that in its last lines else: return None, meaning any metrics without linuxos gets dropped.
The first starlark processor checks for this case and outputs the whole metric (unedited) if that’s the case

1 Like

Yes, it should run with multiple instances of [processors.starlark] plugin in a telegraf configuration file. For some reason, I cannot get two instances of [processors.starlark] plugin working.

Actually, the order is not matter, as long as all the instances of [processors.starlark] plugin run. In this case, the 1st [processors.starlark] plugin does not run, unless I remove the 2nd one.

For the 1st [processors.starlark] plugin, I only need to know/get status for cron, crond, sshd, and ntpd, and purposely drop +100 processes returned from SNMP querying the process table ‘hrSWRunName’. For the 2nd [processors.starlark] plugin, I only need to know if a system is “el7”, or “el8”, or “Ubuntu”, and drop any other Linux flavors/releases. So “return None” is added to drop other metrices that I do not want.

Thanks for the response.

I hope you realized that the pipeline you’ve built is actually applying an AND logic, meaning that only points that match both your starlark rules will be outputted, is that what you actually wanted?

do you have some samples data points (of both cases pass/drop)?

Yes, that’s what I actually wanted with an AND logic.

In order to verify the data points in InfluxDB for the 2nd starlark instance as well, I changed the code in the 2nd starlark instance to have the root filesystem usage outputted to InfluxDB so that we can verify the data points for both starlark instances in InfluxDB. The test results are:

  1. when both starlark instances are “enabled” in the telegraf config file, only the data points for root filesystem usage (total size & total used) are found in InfluxDB; No data points are found in the field “cron”, “ntpd”, and “sshd”.
  2. when the starlark instance for root filesystem usage is removed and just have the starlark instance for cron, crond, ntpd, and sshd, the data points are found in the field “cron”, “ntpd”, and “sshd”.
  3. in the telegraf.log, I see all the print statements in the starlark instances are printed correctly.
  4. It looks like I’m still having issue of getting two starlark instances working correctly.

The related lines in the telegraf agent config file are shown below for your reference:

[[inputs.snmp]]
.....
 [[inputs.snmp.field]]
    name = "sysDescr"
    oid = ".1.3.6.1.2.1.1.1.0"
  [[inputs.snmp.field]]
    name = "root_filesystem_size_rhel7"
    oid = ".1.3.6.1.2.1.25.2.3.1.5.52"
  [[inputs.snmp.field]]
    name = "root_filesystem_used_rhel7"
    oid = ".1.3.6.1.2.1.25.2.3.1.6.52"
  [[inputs.snmp.field]]
    name = "root_filesystem_size_rhel8"
    oid = ".1.3.6.1.2.1.25.2.3.1.5.55"
  [[inputs.snmp.field]]
    name = "root_filesystem_used_rhel8"
    oid = ".1.3.6.1.2.1.25.2.3.1.6.55"
  [[inputs.snmp.field]]
    name = "root_filesystem_size_ubuntu"
    oid = ".1.3.6.1.2.1.25.2.3.1.5.31"
  [[inputs.snmp.field]]
    name = "root_filesystem_used_ubuntu"
    oid = ".1.3.6.1.2.1.25.2.3.1.6.31"

    name = "snmp"
    [[inputs.snmp.table.field]]
      name = "hrSWRunName"
      oid = ".1.3.6.1.2.1.25.4.2.1.2"

[[processors.starlark]]
namepass = ["snmp"]
source = '''
def apply(metric):
  proc_name = metric.fields.get('hrSWRunName')
  if proc_name in [ "cron", "crond", "ntpd", "sshd" ]:
       if proc_name == "cron":
          metric.fields["cron"] = 1
          print (proc_name)
          return metric
       elif proc_name == "crond":
          metric.fields["cron"] = 1
          print (proc_name)
          return metric
       else:
          metric.fields[proc_name] = 1
          print (proc_name)
          return metric
  elif proc_name == None:
    return metric
  return None
'''

[[processors.starlark]]
namepass = ["snmp"]
source = '''
def apply(metric):
  linuxos = metric.fields.get('sysDescr')
  if linuxos != None:
     if linuxos.find("el7") != -1:
        metric.fields['root_filesystem_size'] = metric.fields['root_filesystem_size_rhel7']
        metric.fields['root_filesystem_used'] = metric.fields['root_filesystem_used_rhel7']
        print ("linuxos RHEL7:", linuxos)
        return metric
     elif linuxos.find("el8") != -1:
        metric.fields['root_filesystem_size'] = metric.fields['root_filesystem_size_rhel8']
        metric.fields['root_filesystem_used'] = metric.fields['root_filesystem_used_rhel8']
        print ("linuxos RHEL8:", linuxos)
        return metric
     elif linuxos.find("Ubuntu") != -1:
        metric.fields['root_filesystem_size'] = metric.fields['root_filesystem_size_ubuntu']
        metric.fields['root_filesystem_used'] = metric.fields['root_filesystem_used_ubuntu']
        print ("linuxos Ubuntu:", linuxos)
        return metric
  else:
     return None
'''

@xl3121 , probably, you have to use different namepass for each starlark plugin.

Best regards

@vkhemani, I replaced “return None” with “return metric” at the end of the starlark instance for outputting root filesystem usage to InfluxDB, then both starlark instances in the telegraf agent config file work as designed. Thanks!