Telegraf filter out bad data

I have some devices that have, let’s say poor SNMP implementation and sometimes it returns junk data instead of the proper data. I am looking for a way to ignore the junk data before it goes into my InfluxDB instance. It’s really a matter of a simple regex, only pass data if it matches the regex ^(\D.*), so things like “10025” don’t get passed in, only things like “Austin” or “Chicago”. Since this is a tag in the measurement, I don’t want crap data that I constantly have to manage going into the series at all. It’s probably something simple I’m missing in the docs somewhere.
I am using Telegraf 1.x

Because it is a tag, meaning a string, what you could do is add the regex processor to first see if it matches your proposed regex and replace it with something like “NA” or an empty string.

You could then use tagexclude to drop the specific tag, if you only want to drop the tag. Or use tagpass which would drop the entire metric. FYI, tag pass can take a glob, not a regex.

edit: if you are on 1.27, which just came out, you can also take a look at the metricpass option to see if the value is a number and if not drop the metric.

I never learned Glob, just doesn’t seem as flexible and powerful as Regex at first blush, but I may be wrong. Running Telegraf 1.22.4 currently.

So here’s the processor I’m looking at, the measurement name is “edge”

  namepass = ["edge"]

    key = "location"
    pattern = "^(\d.*)"
    replacement = "0000"

Then for the input plugin itself, which is SNMP, this is what I’m thinking (and hoping I’m right)

	name = "sysname"
	oid = "."
	is_tag = true
	name = "description"
	oid = "."
	name = "uptime"
	oid = "."
	name = "location"
	oid = "."
  is_tag = true
  tagexclude = ["0000"]

Is this even close?

Almost :slight_smile: Processors run after an input and that tagexclude would always drop that tag.

Do you want to drop just the tag or the entire metric?

Just the tag, which it looks like the processor is working but I am seeing “0000” get inserted as a tag. Other “location” entries are valid, it’s just sometimes the device returns a random number for the SNMP OID for whatever reason.

To get that level of logic, we would want to use starlark:

  source = '''
def apply(metric):
    if "location" in metric.tags and metric.tags["location"] == "0000":
    return metric

What is a starlark?
I think the root issue has been solved because there was a Unicode character in the name of a device that was causing havoc, and that’s been resolved so betting no more issues. I don’t see any new 0000 entries that is.

Glad to hear

See: telegraf/plugins/processors/starlark at master · influxdata/telegraf · GitHub for more on starlark