Reading all the fields from XPath in one go without defining each field

I am reading the data from the MQTT consumer and data is coming in XML format after deserializing from proto.

In my data, I have more than 1K fields and every field name is different so defining each field in telegraf.conf will be too difficult and not feasible as a new field gets added then we have to change our telegraf.conf.
So I want to read all the fields in a single shot without defining every field and store them in InfluxDB.
Maybe based on the data type or some other XML tag we can process all the fields

I am attaching sample data from my consumer which I am getting after deserializing.

<version>3.0</version>
<SystemId>proxy1</SystemId>
<Time>1512476653</Time>
<Mac>00:0b:6b:ee</Mac>
   <Report>
      <Bins>
         <DurationSec>900</DurationSec>
         <NetworksViews>
            <SuccSum>103</SuccSum>
            <NbrSum>105</NbrSum>
            <Locked>120</Locked>
            <txK>18446744073709551615</txK>
        </NetworksViews>
      </Bins>
   </Report>

I want to read all the data as a field that is coming inside <Bins>................</Bins>

<Version>
<SystemId>
<Time>
<Report>

All of the above I want to make it as a tag and data inside in <Bins>................</Bins> I want to make fields.
The main challenge we are facing is that our all fields have different names and not much pattern they follow so how to read these in a single shot.

I am putting my telegraf.conf snippet as well.

[agent]
## Default data collection interval for all inputs
  interval = "1s"
  ## Log at debug level.
  debug = true
  ## Log only error level messages.
  quiet = false
[[inputs.mqtt_consumer]]
  servers = ["mqtt+ssl://aws-activeMQhost:8883"]
  qos = 0
  ## Topics that will be subscribed to.
  topics = [
    "topic-name",
  ]
  connection_timeout = "30s"
  ## If unset, a random client ID will be generated.
  client_id = "telegraf1"
  ## Username and password to connect MQTT server.
  username = "username"
  password = "password"
  data_format = "xpath_protobuf"
  xpath_protobuf_type = "package_name"
  xpath_protobuf_file = "filename.proto"
  xpath_protobuf_import_paths = ["/home/ubuntu/ravi/ec2/etc/telegraf/protofiles/"]
  xpath_print_document = true
  [[inputs.mqtt_consumer.xpath]]
    data_format = "xml"
    metric_selection = "//bin"
    metric_name = "string('device_data')"
    [inputs.mqtt_consumer.xpath.tags]
      SystemId = "/SystemId"
      Mac = "/Mac"
      version = "/version"
      [inputs.mqtt_consumer.xpath.fields]
        SuccSum= "/Report/Bins/NetworksViews/SuccSum"
        NbrSum= "/Report/Bins/NetworksViews/NbrSum"
        Locked= "/Report/Bins/NetworksViews/Locked"
        txK= "/Report/Bins/NetworksViews/txK"


[[outputs.file]]
##   ## mqtt_consumers to write to, "stdout" is a specially handled mqtt_consumer.
  files = ["stdout"]
  data_format = "influx"
[[outputs.influxdb_v2]]
  urls = ["http://influxdb:8086/"]
  token = "$INFLUXDB_TOKEN"
  organization = "org"
  bucket = "temp"

I have read this document but was not able to figure out much for my case.

Thanks

Hi @Ravikant_Gautam,
I think this blog we able to help you out here. Skip to this section: How to Parse Your XML Data with Telegraf | InfluxData

Hi, I have referred this blog but still, I am not able to get the data.

<?xml version="1.0" encoding="UTF-8"?>
<Mac>00:0b:6b:ee</Mac>
<Serial>40111</Serial>
<Venue>US</Venue>
<Name>tReport</Name>
<BuildNum>01.03.34.00</BuildNum>
<version>3.0</version>
<SystemId>proxy1</SystemId>
<BinStartTime>1512476653</BinStartTime>
   <Report>
      <Bins>
         <DurationSec>900</DurationSec>
         <NetworksViews>
            <SuccSum>103</SuccSum>
            <NbrSum>105</NbrSum>
            <GpsLocked>120</GpsLocked>
             .
             . all fields are coming here
             .
         </NetworksViews>
         <Number>12</Number>
         <StartTime>1514368800</StartTime>
         <StartSec>60</StartSec>
      </Bins>
   </Report>

below is the snippet of telegraf.conf what I am providing

[[inputs.mqtt_consumer.xpath]]
    data_format = "xml"
    metric_selection = "BinStartTime/Report/child::Bins"
    metric_name = "string('chronos_cbrs')"
    field_selection = "child::Bins"
    field_name = "name(@*[1])"
    field_value = "number(@*[1])"
    [inputs.tail.xml.tags]
      address = "@name"
      DurationSec= "@DurationSec"
    [inputs.tail.xml.fields]
      txKbps = "string(txKbps)"

still, I am not sure how to get the Mac, Serial, Venue, Name, BuildNum as a tag
I am pasting the logs as well

Mar 11 11:22:36 ip-192-168-133-74 telegraf[30838]: 2022-03-11T11:22:36Z D! [parsers.xpath_protobuf::mqtt_consumer] Number of configs: 1
Mar 11 11:22:36 ip-192-168-133-74 telegraf[30838]: 2022-03-11T11:22:36Z D! [parsers.xpath_protobuf::mqtt_consumer] got 0 nodes for query "BinStartTime/Report/child::reportBins" in metri
Mar 11 11:22:36 ip-192-168-133-74 telegraf[30838]: 2022-03-11T11:22:36Z D! [parsers.xpath_protobuf::mqtt_consumer] got 0 nodes for query "BinStartTime/Report/child::*" in metric selecti
Mar 11 11:22:36 ip-192-168-133-74 telegraf[30838]: 2022-03-11T11:22:36Z D! [parsers.xpath_protobuf::mqtt_consumer] got 0 nodes for query "BinStartTime/Report" in metric selection
Mar 11 11:22:36 ip-192-168-133-74 telegraf[30838]: 2022-03-11T11:22:36Z D! [parsers.xpath_protobuf::mqtt_consumer] got 1 nodes for query "BinStartTime" in metric selection
Mar 11 11:22:36 ip-192-168-133-74 telegraf[30838]: 2022-03-11T11:22:36Z E! [inputs.mqtt_consumer] Error in plugin: cannot parse with empty selection node

Below snippet, I also tried but it’s not working,

[[inputs.mqtt_consumer.xpath]]
    data_format = "xml"
    metric_selection = "Report/child::Bins"
    metric_name = "string('chronos_cbrs')"
    field_selection = "child::Bins"
    field_name = "name(@*[1])"
    field_value = "number(@*[1])"
    [inputs.tail.xml.tags]
      address = "@name"
      DurationSec= "@DurationSec"
    [inputs.tail.xml.fields]
      GpsLocked= "string(GpsLocked)"

same as above I am getting errors for this.

can I get an update on it?
It will be highly appreciated

Hi @Ravikant_Gautam,
I haven’t forgotten about this issue I am awaiting feedback from some of my colleagues :slight_smile:

Hi @Ravikant_Gautam,
So I spoke with my colleague on the Telegraf team and here is what the came up with:

  [[inputs.file.xpath]]
    metric_selection = "//Bins"
    metric_name = "string('bin')"
    field_selection = "descendant::*[not(*)]"

Essentially what this will do is iterate through all child nodes irrespective of level and turn them into fields. This solves the first issue of collecting the data. Now to work on the issue with the tags. Just some comments here:

  1. I cannot see the address tag in the data sample you gave me so its hard to create a reference example to this.
  2. If you are storing this data within InfluxDB I highly recommend not using Duration sec as a tag. This can lead to performance issues with cardinality. I would keep this value as a field and use pivot within flux instead.