Nested JSON Array

Hi,

I´m trying to ingest some JSON data from an IoT scenario, but can’t get my head around how to ingest some JSON arrays.
I got the following data:

{
  "device_name": "_5_115_3_TK_Emmastr_14",
  "properties": [
    {
      "last_changed": "2021-04-13T12:22:20.823+02:00",
      "name": "N01 IL1",
      "type": "MeasureValue",
      "value": 0.3552161455154419
    },
    {
      "last_changed": "2021-04-13T12:22:20.823+02:00",
      "name": "N01 IL2",
      "type": "MeasureValue",
      "value": 0.9352161455154419
    }
  ]
}

What I need is device_name, last_changed(as the timestamp), name, value.
So the output would be:

_5_115_3_TK_Emmastr_14,unixtimestamp, N01 IL1, 0.3552161455154419
_5_115_3_TK_Emmastr_14,unixtimestamp, N01 IL2, 0.9352161455154419

My input/output config look like this:

[[inputs.file]]
  files = ["/home/goodvirus/Downloads/example.json"]
  data_format = "json"
  json_time_key = "properties_last_changed"
  json_time_format =  "2006-01-02T15:04:05.000+02:00"

 [[outputs.file]]
   files = ["stdout", "/home/goodvirus/Downloads/metrics.out"]
   data_format = "json"
   json_timestamp_units = "1s"

I have looked in many posts and it seems that there is no solution, which would be really sad because then I would have to write an additional program to flatten the JSON and then I would reconsider using influx as an database, because I will probably have some similar tasks in the future for data ingestion.

Any pointers?

Thanks in advance!

Paul

Hello @goodvirus,
Sorry you’re having trouble! Let’s try and fix that. Fist thing, I noticed your line protocol output is wrong. Correcting this would be the first step so I can try and help.
Line protocol is in the format:
measurement,tagkey=tagvalue fieldkey=fieldvalue timestamp unixtimestamp
So do you want your field key to be value?
And your tag key to be name?

Have you tried using the gjson path? telegraf/plugins/parsers/json at master · influxdata/telegraf · GitHub

For example to specify the timestamp, you would use:

properties.1.last_changed

I recommend using https://gjson.dev/ to test parsing.

1 Like

Hi Anaisdg,

thanks for your help! You are correct the Line protocol output is not correct.
I fixed the input to:

data_format = “json”
json_time_key = “properties_0_last_changed”
json_time_format = “2006-01-02T15:04:05.000+01:00”
json_string_fields = [“device_name”, “properties_#_name”]

And now I get:

{“fields”:{“device_name”:"_5_115_3_TK_Emmastr_14",“properties_0_value”:0.3552161455154419,“properties_1_value”:0.9552161455154419},“name”:“file”,“tags”:{“host”:“IoTHubIntNetzstationen”},“timestamp”:1613218941}

What I want ist that the array is split and output 2 different lines like so:

{“fields”:{“device_name”:"_5_115_3_TK_Emmastr_14",“N01 IL1”:0.3552161455154419},“name”:“file”,“tags”:{“host”:“IoTHubIntNetzstationen”},“timestamp”:1613218941}
{“fields”:{“device_name”:"_5_115_3_TK_Emmastr_14",“N01 IL2”:0.9352161455154419},“name”:“file”,“tags”:{“host”:“IoTHubIntNetzstationen”},“timestamp”:1613218941}

Because the timestamp “last_changed” can be equal but maybe not every time. I also experimented with tags but the are always applied to all fields correct? Also the amount of entries in the array also varies…

Sorry for my not really inteligent question, but I´m new to the whole influxdb and telegraf thing…

Thanks again,

Paul

Hello @goodvirus,
I’m sorry I’m having trouble understanding. Can you please share the line protocol output that you expect?

Hi Anaisdg,

no worries, probably I´m not expression myself very good because of my lake of knowledge with telegraf…

I got the following line protocol:

file,host=IoTHubIntNetzstationen device_name="_5_115_3_TK_Emmastr_14",properties_0_name=“N01 IL1”,properties_1_value=0.9552161455154419,properties_0_value=0.3552161455154419 1613218941000000000

But I need the following:

file,host=IoTHubIntNetzstationen device_name="_5_115_3_TK_Emmastr_14",properties_0_name=“N01 IL1”,properties_0_value=0.3552161455154419 1613218941000000000
file,host=IoTHubIntNetzstationen device_name="_5_115_3_TK_Emmastr_14",properties_0_name=“N01 IL2”,properties_0_value=0.9352161455154419 1613218941000000000

Or any other way to differentiate between N01 IL1/N01 IL2 Values (there are around 60 and the order is not always the same, and not always are all present and the timestamp is not always identical…)

Thank you for your time and wisdom!

I have a similar issue and may be able to expand on @goodvirus’s explanation.

Let’s use the following example data:

{
  "building": "B1",
  "occupants": [
    {"first": "Dale", "last": "Murphy", "cardScan": "16:05:01", "counter": 1},
    {"first": "Roger", "last": "Craig", "cardScan": "18:35:23", "counter": 5},
    {"first": "Jane", "last": "Murphy", "cardScan": "21:02:14", "counter": 2}
  ]
}

For argument’s sake, this is how data arrives from the device/system/etc., and the shape cannot be modified.

I would like the building data to be part of each entry.
GJSON does allow iterating through arrays. However, I cannot find a way to reference elements outside of the array.

For example, we can get a Telegraf-compatible output of occupants with json_query = in config, or testing on https://gjson.dev/, using:
occupants

We can also iterate on the array to make our own (renamed or filtered) shape:
occupants.#.{fname:first, lname:last, entryTime:cardScan}

But as far as I can tell, you can’t reach out of the iteration scope, eg:
occupants.#.{fname:first, lname:last, entryTime:cardScan, building:building}

I don’t know of any way to target a parent, or root element on each iteration.

We may be able to get tricky with something like @join or @flatten, if they’re supported, but what that solution may be escapes me.

After a bit of research, it appears the JSON Processor plugin (and GJSON) are simply too limited for what we’re trying to do. However, there are other ways to process json.

I suggest checking out this excellent Influx Blog article: JSON to InfluxDB with Telegraf and Starlark.

Essentially, instead of using the JSON Processor, you can use Starlark (modified Python) to more finely control your output.

Example for my previous building sample:

# buildingLocks.star Starlark file
load("json.star", "json")
load("time.star", "time")

def apply(metric):
    j = json.decode(metric.fields.get("value"))

    metrics = []
    for entry in j["occupants"]:  # Array of JSON elements to become Metrics
        # Set Metric name
        new_metric = Metric("peopleCount")
        # Set Field
        new_metric.fields["count"] = entry["counter"]
        # Set Tags
        new_metric.tags["fname"] = entry["first"]
        new_metric.tags["lname"] = entry["last"]
        new_metric.tags["building"] = j["building"]
        # Set Time
        new_metric.time = time.parseTime(entry["cardScan"], "12:01:03")
        metrics.append(new_metric)

    return metrics

In @goodvirus’s case, the Starlark file may look something like this:

# myDevice.star Starlark file
load("json.star", "json")
load("time.star", "time")

def apply(metric):
    j = json.decode(metric.fields.get("value"))

    metrics = []
    for property in j["properties"]:  # Array of JSON elements to become Metrics
        # Set Metric name
        new_metric = Metric("sensor")
        # Set Field
        new_metric.fields["value"] = property["value"]
        # Set Tags
        new_metric.tags["name"] = property["name"]
        new_metric.tags["device_name"] = j["device_name"]
        # Set Time
        new_metric.time = time.parseTime(property["last_changed"])
        metrics.append(new_metric)

    return metrics

I need to generate the following JSON telegraf output, where the fields section is an array. Presuming I generate/parse the same JSON telegraf input, is there an easier way to achieve this output?

It looks like telegraf converts all input to standard LineProtocol, then converts that to the desired output. I that the case? Does LineProtocol handle arrays? Or, are arrays handled via naming, something like this for fields section: name1_val1=2,name1_val1=90,name2_val1=32,name2_val2=88

DESIRED OUTPUT

{
“name”: “MyData”,
“tags”: {
“serialnumber”: 12345678
},
“fields”: [
{
“name”: “A”,
“avg”: 88,
“min”: 77,
“max”: 92
},
{
“name”: “B”,
“avg”: 91,
“min”: 75,
“max”: 99
}
],
“lastChange”: 0
}