Telegraf - Weird behaviour with JSON-transformation after first collection interval

Hi everyone,

I’m working on a data collection system which utilizes Telegraf to handle some of the data processing. Mostly my experience with Telegraf has been flawless, but seems that I have now hit a small bump and I couldn’t find any any information around my issue so let’s see if anyone here has experienced similar problems.

Brief system description:

I’m running Telegraf 1.25.0, installed natively on Debian 11.6.

My setup uses the [inputs.influxdb_listener] -plugin to listen for writes going to InfluxDB API (v1.8). The incoming metrics are tagged by their target database by using the optional database_tag. This database_tag is used to filter the metrics later.

Ingested metrics are then outputted using three different plugins.

  • Metrics are written to csv-files using the [outputs.file] -plugin. Metrics are filtered by the previously defined database_tag. This way metrics are written to separate files, depending on their original target database.

  • Metrics are also routed to their original destination using the [outputs.influxdb] -plugin. This plugin is used to route the metrics to correct databases, again by filtering by the database_tag.

  • Finally the metrics are outputted to external HTTP-endpoint using the [outputs.http] -plugin. Filtering is done again using the database_tag. There are multiple [outputs.http] -sections, since the original target database defines the API-key used to post that metric to HTTP-endpoint. Body of the POST-request needs to be formatted in specific way, so json_transformation is used.

The problem:

All the bits of this setup seems to work properly, expect the [outputs.http] -plugin(s). When I start Telegraf everything seems to start as it should. The influxdb_listener receives first write, and every output runs successfully. Files are written, InfluxDB gets the metrics and even the HTTP-POST completes perfectly. I’m testing with local Node-RED HTTP-endpoint, so I can verify that the request body is formatted correctly.

But when the second write hits the influxdb_listener suddenly the [outputs.http] starts to fail and throws an error related to the json_transformation:

[outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 3

After each intervall this each-function seems to get more and more arguments:

2023-02-17T10:45:13Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 3
2023-02-17T10:45:23Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 4
2023-02-17T10:45:33Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 5
2023-02-17T10:45:43Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 6

The big question is that, is this intended behaviour?

My debugging results so far:

I managed to recreate this problem using only one [inputs.file] to read some metrics written in line-protocol and one [outputs.file] to write the resulting json to a file. The errors above are collected from this setup so thats why the errors are shown to originate from [outputs.file] -plugin.

The configuration used for this setup can be seen here:

[agent]
  interval = "10s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = "0s"
  hostname = ""
  omit_hostname = true

[[outputs.file]]
  alias = "file-debug"
  files = ["/tmp/debugging-file.json"]
  use_batch_format = true

  data_format = "json"
  json_timestamp_units = "1ms"

  json_transformation = '''
  {
     "names": $distinct(metrics.tags.key),
     "valueRows": metrics{$string(timestamp): fields.value[]} ~> $each(function($v, $k) {
        {
            "timestamp": $number($k),
            "values": $v
        }
     })
  }
  '''

[[inputs.file]]
  files = ["/tmp/inputdata.txt"]
  data_format = "influx"

And the contents of inputdata.txt is the following:

data,key=FII-ACT value=3751.3 1676285135457000000
data,key=FII-SET value=2383.5 1676285135457000000
data,key=CMD value=-14.478 1676285135457000000
data,key=AQP-FII_S value=2620.9 1676285135457000000
data,key=AQP_RPM value=305.07 1676285135457000000
data,key=N2-ACTUAL value=1339.8 1676285135457000000
data,key=P-CORR value=234.77 1676285135457000000
data,key=MIN-RPM value=1850.6 1676285135457000000
data,key=N1-RPM value=-2867.8 1676285135457000000
data,key=LOCAL value=1 1676285135457000000
data,key=CL-TEMP-H value=0 1676285135457000000
data,key=FAILURE value=1 1676285135457000000
data,key=WARNING value=1 1676285135457000000
data,key=SC-MODE value=1 1676285135457000000
data,key=CL-EN-P value=0 1676285135457000000
data,key=Modbus_register_20009_6 value=1 1676285135457000000
data,key=Modbus_register_20009_7 value=1 1676285135457000000
data,key=Modbus_register_20009_8 value=1 1676285135457000000
data,key=Modbus_register_20009_9 value=1 1676285135457000000
data,key=Modbus_register_20009_10 value=1 1676285135457000000
data,key=Modbus_register_20009_11 value=1 1676285135457000000
data,key=Modbus_register_20009_12 value=1 1676285135457000000
data,key=Modbus_register_20009_13 value=1 1676285135457000000
data,key=Modbus_register_20009_14 value=0 1676285135457000000
data,key=Modbus_register_20009_15 value=1 1676285135457000000
data,key=FII-ACT value=2064.8 1676285140456000000
data,key=FII-SET value=441.1 1676285140456000000
data,key=CMD value=-30.786 1676285140456000000
data,key=AQP-FII_S value=5086.6 1676285140456000000
data,key=AQP_RPM value=275.56 1676285140456000000
data,key=N2-ACTUAL value=2831.5 1676285140456000000
data,key=P-CORR value=204.31 1676285140456000000
data,key=MIN-RPM value=-3214 1676285140456000000
data,key=N1-RPM value=2238.4 1676285140456000000
data,key=LOCAL value=0 1676285140456000000
data,key=CL-TEMP-H value=1 1676285140456000000
data,key=FAILURE value=1 1676285140456000000
data,key=WARNING value=1 1676285140456000000
data,key=SC-MODE value=1 1676285140456000000
data,key=CL-EN-P value=0 1676285140456000000
data,key=Modbus_register_20009_6 value=0 1676285140456000000
data,key=Modbus_register_20009_7 value=1 1676285140456000000
data,key=Modbus_register_20009_8 value=0 1676285140456000000
data,key=Modbus_register_20009_9 value=0 1676285140456000000
data,key=Modbus_register_20009_10 value=1 1676285140456000000
data,key=Modbus_register_20009_11 value=0 1676285140456000000
data,key=Modbus_register_20009_12 value=0 1676285140456000000
data,key=Modbus_register_20009_13 value=0 1676285140456000000
data,key=Modbus_register_20009_14 value=0 1676285140456000000
data,key=Modbus_register_20009_15 value=0 1676285140456000000

Recreating this problem verifies that the other plugins in my original setup are not causing this problem.

I did alot with the JSONata-query to make sure that the transformation is valid, and it is working like it should. I used the JSONata online exerciser to form the query. Sample data and query with the correct result can be seen here: https://try.jsonata.org/sZEtYbpsu

Currently I’m not sure why the $each-function in the json_transformation can even get increasing amount of arguments on every iteration. In these tests I would expect that the resulting json would be written to the file the same exact way on every interval. In this case the json is written correctly only on the first iteration, but on every interval after the first it just fails and doesn’t append anything to the file.

Is Telegraf caching the previous metrics in some weird way or is the JSONata purely to blame? There is also possibility for user error so that might be the reason too :slight_smile:

@srebhan thoughts on this one? Feels like something is not getting reset?

Hmmm hard to say. If you choose version 1.5.4 (that’s the one we support) on the JSONata playground it also fails. So I guess it is the problem in the JSONata implementation and/or the version supported.

Hi @srebhan, @jpowers

You are correct about my previous JSONata playground example failing if you switch to version 1.5.4. I believe this is because the $distinct function was released with the JSONata version 1.7.0. However there are more weirdness about this case…

If you use the following transformation in your telegraf.conf, it seems to run fine. If i understand the versioning correctly this shouldn’t work in Telegraf, but it runs flawlessly:

 json_transformation = '''
  {
     "names": $distinct(metrics.tags.key)
  }

So apparently Telegraf does support atleast some of these newer JSONata functions? I also understand the fact that some new features might work, but they are not officially supported/tested, might that be the case here?

More debugging:

You can completely remove the row that generates the names -property from the json-transformation, and Telegraf still spit out the same errors as show in my first post. After removing that row, you are left with simple query that according to JSONata should be supported since the release of JSONata version 1.2.0. That query look like this:

{
    "valueRows": metrics{$string(timestamp): fields.value[]} ~> $each(function($v, $k) {
        {
            "timestamp": $number($k),
            "values": $v
        }
    })
}

Here is the JSONata playground example like in my first post, only the $distinct function has been removed, just like shown in previous code block. This will run successfully when you switch to JSONata version 1.5.4: https://try.jsonata.org/SgXkPh4LO

JSONata seems to be fine with this transformation, but Telegraf just cannot handle this query. I still think that there is some problem with Telegraf, since this is clearly not the intended behaviour. Or atleast this behaviour does not make any sense to me.

Does my point make any sense or is this information giving you any ideas what could be culprit here?

The thing is, we are using this library and it might or might not include additional functionality from higher versions…

For me to check Telegraf, is the data in your first post representative?

I used the examples in your first post with the following config

[agent]
  omit_hostname = true

###############################################################################
[[inputs.file]]
  files = ["./test_configs/jsontest_transform_forum_trex.influx"]
  data_format = "influx"

###############################################################################

[[outputs.file]]
  files = ["stdout"]
  data_format = "json"
  use_batch_format = true

  json_transformation = '''
  {
      "valueRows": metrics{$string(timestamp): fields.value[]} ~> $each(function($v, $k) {
          {
              "timestamp": $number($k),
              "values": $v
          }
      })
  }
  '''

without error and the following result

{"valueRows":[{"timestamp":1676285135,"values":[3751.3,2383.5,-14.478,2620.9,305.07,1339.8,234.77,1850.6,-2867.8,1,0,1,1,1,0,1,1,1,1,1,1,1,1,0,1]},{"timestamp":1676285140,"values":[2064.8,441.1,-30.786,5086.6,275.56,2831.5,204.31,-3214,2238.4,0,1,1,1,1,0,0,1,0,0,1,0,0,0,0,0]}]}

I did copy the configuration from your post and ran it on my machine.

It does run successfully, but only once. If you let it run for bit longer than just the first interval, you should start getting errors that I have been talking about. This is the same exact error that I showed in my first post.

So let it run once (or use the --once argument) it works. Let it run for two or more intervals, then it doesn’t work. Can you replicate this?

I can reproduce the issue. Please open an issue, add the data in your first post and the latest transformation config!

Feel free to mention me on or assign me to the issue!

Cool that you also got it to “not work”.

I opened the issue for this case and it can be seen here:

Thanks for help so far.

Please test the fix outlined in your issue.

The fix seemed to work nicely and no more errors at least for this use-case.

Thanks @srebhan!

1 Like