Unable to get nested json objects as JSONB in postgres

Hello there,

I am currently trying to index the following type of json data into postgres from a kafka topic using telegraf kafka consumer input plugin with json_v2 as the data format.

{
  "application_name": "Shop",
  "db_name": "",
  "db_system": "",
  "destination": "",
  "duration_nano": 38283264,
  "end_time": "2011-11-07T20:09:46.333065472Z",
  "errorSpan": false,
  "event_timestamp": "2011-11-07T20:09:46.333065472Z",
  "exception": {
    "message": "",
    "stacktrace": "",
    "type": ""
  },
  "http_method": "GET",
  "http_route": "/logout",
  "http_status_code": 200,
  "id": "27a8606a95ec3423",
  "is_root_span": true,
  "kind": "SPAN_KIND_SERVER",
  "messaging_destination": "",
  "messaging_system": "",
  "name": "GET /logout",
  "origin": "front-end",
  "parent_span_id": "",
  "resource_attributes": {
    "application": "Shop",
    "service": {
      "name": "front-end"
    }
  },
  "rpc_service": "",
  "rpc_system": "",
  "scope": {
    "name": "",
    "version": ""
  },
  "service_name": "front-end",
  "span_attributes": {
    "http": {
      "flavor": "1.1",
      "host": "localhost:8081",
      "method": "GET",
      "route": "/logout",
      "status_code": 200,
      "status_text": "OK",
      "target": "/logout",
      "url": "http://localhost:8081/logout",
      "user_agent": "curl/7.68.0"
    },
    "net": {
      "host": {
        "ip": "::ffff:127.0.0.1",
        "name": "localhost",
        "port": 8079
      },
      "peer": {
        "ip": "::ffff:127.0.0.1",
        "port": 51498
      },
      "transport": "ip_tcp"
    }
  },
  "start_time": "2023-05-07T20:09:46.294782208Z",
  "status_code": 0,
  "trace_id": "84b8fc6d0a0c10369789c832f25da4df"
}

Telegraf json_v2 configuration:

  ## Data format to consume.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  data_format = "json_v2"
    [[inputs.kafka_consumer.json_v2]]
      [[inputs.kafka_consumer.json_v2.object]]
          path = "@this"
          disable_prepend_keys = false

My goal is to be able to index the json objects resource_attributes and span_attributes as JSONB data in postgres as these two fields are dynamic and I don’t want my table to end up with many sparse columns. For this I have tried setting these json object fields as tags (along with tags_as_jsonb set to true in the postgres output plugin) but every time it resulted in an empty jsonb field in the postgres database i.e., just a column with {} as the value. What am I doing wrong here ? Is it a configuration problem or can’t we set whole json objects as tags in json_v2 ?

Another thing I tried was trying to send the above mentioned json fields as plain text instead of jsons and do a alter column type jsonb command in the create_templates part of the postgres output plugin.

  create_templates = ['''CREATE TABLE {{.table}} ({{.allColumns}})''','''ALTER TABLE {{.table}} ALTER COLUMN span_attributes type JSONB, ALTER COLUMN resource_attributes type JSONB''' ]

That resulted in the following error,
ERROR: unsupported jsonb version number 123

Any idea on whats causing this ? Is it a bug ? The thing is I am able to alter the columns inside postgres using the alter command above, but it doesn’t work when I use it in telegraf like above.

Any help or guidance would be appreciated.

Regards,
Siva

Hello @Siva_Chaitanya_M,
Hmm I’m not sure I haven’t encountered that error before.
@jpowers do you know?
Thank you

My goal is to be able to index the json objects resource_attributes and span_attributes as JSONB data in postgres as these two fields are dynamic and I don’t want my table to end up with many sparse columns.

Is it a configuration problem or can’t we set whole json objects as tags in json_v2 ?

The json_v2 parser’s primary task is around breaking out specific tags and fields from JSON data. I have not seen it used to grab an entire object as a string.

I believe you can use the xpath parser for this, something like:

[[inputs.file]]
  files = ["input.json"]
  data_format = "xpath_json"
  xpath_native_types = true

  [[inputs.file.xpath]]
    [inputs.file.xpath.fields]
      span_attributes = "string(/span_attributes)"
      resource_attributes = "string(/resource_attributes)"

Which would return something like:

file span_attributes="map[http:map[flavor:1.1 host:localhost:8081 method:GET route:/logout status_code:200 status_text:OK target:/logout url:http://localhost:8081/logout user_agent:curl/7.68.0] net:map[host:map[ip:::ffff:127.0.0.1 name:localhost port:8079] peer:map[ip:::ffff:127.0.0.1 port:51498] transport:ip_tcp]]",resource_attributes="map[application:Shop service:map[name:front-end]]" 1685626772000000000

But you probably want the raw json and not the internal structures. @srebhan what xpath setting am I missing to get the raw JSON as the string instead of the “maps” structure?

You probably want xpath_native_types = true removed or xpath_native_types = false. This way you can also skip the string() parts…