Hi everyone,
I’m working on a data collection system which utilizes Telegraf to handle some of the data processing. Mostly my experience with Telegraf has been flawless, but seems that I have now hit a small bump and I couldn’t find any any information around my issue so let’s see if anyone here has experienced similar problems.
Brief system description:
I’m running Telegraf 1.25.0
, installed natively on Debian 11.6
.
My setup uses the [inputs.influxdb_listener]
-plugin to listen for writes going to InfluxDB API (v1.8). The incoming metrics are tagged by their target database by using the optional database_tag
. This database_tag
is used to filter the metrics later.
Ingested metrics are then outputted using three different plugins.
-
Metrics are written to
csv
-files using the[outputs.file]
-plugin. Metrics are filtered by the previously defineddatabase_tag
. This way metrics are written to separate files, depending on their original target database. -
Metrics are also routed to their original destination using the
[outputs.influxdb]
-plugin. This plugin is used to route the metrics to correct databases, again by filtering by thedatabase_tag
. -
Finally the metrics are outputted to external HTTP-endpoint using the
[outputs.http]
-plugin. Filtering is done again using thedatabase_tag
. There are multiple[outputs.http]
-sections, since the original target database defines the API-key used to post that metric to HTTP-endpoint. Body of the POST-request needs to be formatted in specific way, sojson_transformation
is used.
The problem:
All the bits of this setup seems to work properly, expect the [outputs.http]
-plugin(s). When I start Telegraf everything seems to start as it should. The influxdb_listener
receives first write, and every output runs successfully. Files are written, InfluxDB gets the metrics and even the HTTP-POST completes perfectly. I’m testing with local Node-RED HTTP-endpoint, so I can verify that the request body is formatted correctly.
But when the second write hits the influxdb_listener
suddenly the [outputs.http]
starts to fail and throws an error related to the json_transformation
:
[outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 3
After each intervall this each
-function seems to get more and more arguments:
2023-02-17T10:45:13Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 3
2023-02-17T10:45:23Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 4
2023-02-17T10:45:33Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 5
2023-02-17T10:45:43Z E! [outputs.file::file-debug] Could not serialize metric: function "each" takes 2 argument(s), got 6
The big question is that, is this intended behaviour?
My debugging results so far:
I managed to recreate this problem using only one [inputs.file]
to read some metrics written in line-protocol and one [outputs.file]
to write the resulting json to a file. The errors above are collected from this setup so thats why the errors are shown to originate from [outputs.file]
-plugin.
The configuration used for this setup can be seen here:
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = "0s"
hostname = ""
omit_hostname = true
[[outputs.file]]
alias = "file-debug"
files = ["/tmp/debugging-file.json"]
use_batch_format = true
data_format = "json"
json_timestamp_units = "1ms"
json_transformation = '''
{
"names": $distinct(metrics.tags.key),
"valueRows": metrics{$string(timestamp): fields.value[]} ~> $each(function($v, $k) {
{
"timestamp": $number($k),
"values": $v
}
})
}
'''
[[inputs.file]]
files = ["/tmp/inputdata.txt"]
data_format = "influx"
And the contents of inputdata.txt
is the following:
data,key=FII-ACT value=3751.3 1676285135457000000
data,key=FII-SET value=2383.5 1676285135457000000
data,key=CMD value=-14.478 1676285135457000000
data,key=AQP-FII_S value=2620.9 1676285135457000000
data,key=AQP_RPM value=305.07 1676285135457000000
data,key=N2-ACTUAL value=1339.8 1676285135457000000
data,key=P-CORR value=234.77 1676285135457000000
data,key=MIN-RPM value=1850.6 1676285135457000000
data,key=N1-RPM value=-2867.8 1676285135457000000
data,key=LOCAL value=1 1676285135457000000
data,key=CL-TEMP-H value=0 1676285135457000000
data,key=FAILURE value=1 1676285135457000000
data,key=WARNING value=1 1676285135457000000
data,key=SC-MODE value=1 1676285135457000000
data,key=CL-EN-P value=0 1676285135457000000
data,key=Modbus_register_20009_6 value=1 1676285135457000000
data,key=Modbus_register_20009_7 value=1 1676285135457000000
data,key=Modbus_register_20009_8 value=1 1676285135457000000
data,key=Modbus_register_20009_9 value=1 1676285135457000000
data,key=Modbus_register_20009_10 value=1 1676285135457000000
data,key=Modbus_register_20009_11 value=1 1676285135457000000
data,key=Modbus_register_20009_12 value=1 1676285135457000000
data,key=Modbus_register_20009_13 value=1 1676285135457000000
data,key=Modbus_register_20009_14 value=0 1676285135457000000
data,key=Modbus_register_20009_15 value=1 1676285135457000000
data,key=FII-ACT value=2064.8 1676285140456000000
data,key=FII-SET value=441.1 1676285140456000000
data,key=CMD value=-30.786 1676285140456000000
data,key=AQP-FII_S value=5086.6 1676285140456000000
data,key=AQP_RPM value=275.56 1676285140456000000
data,key=N2-ACTUAL value=2831.5 1676285140456000000
data,key=P-CORR value=204.31 1676285140456000000
data,key=MIN-RPM value=-3214 1676285140456000000
data,key=N1-RPM value=2238.4 1676285140456000000
data,key=LOCAL value=0 1676285140456000000
data,key=CL-TEMP-H value=1 1676285140456000000
data,key=FAILURE value=1 1676285140456000000
data,key=WARNING value=1 1676285140456000000
data,key=SC-MODE value=1 1676285140456000000
data,key=CL-EN-P value=0 1676285140456000000
data,key=Modbus_register_20009_6 value=0 1676285140456000000
data,key=Modbus_register_20009_7 value=1 1676285140456000000
data,key=Modbus_register_20009_8 value=0 1676285140456000000
data,key=Modbus_register_20009_9 value=0 1676285140456000000
data,key=Modbus_register_20009_10 value=1 1676285140456000000
data,key=Modbus_register_20009_11 value=0 1676285140456000000
data,key=Modbus_register_20009_12 value=0 1676285140456000000
data,key=Modbus_register_20009_13 value=0 1676285140456000000
data,key=Modbus_register_20009_14 value=0 1676285140456000000
data,key=Modbus_register_20009_15 value=0 1676285140456000000
Recreating this problem verifies that the other plugins in my original setup are not causing this problem.
I did alot with the JSONata-query to make sure that the transformation is valid, and it is working like it should. I used the JSONata online exerciser to form the query. Sample data and query with the correct result can be seen here: https://try.jsonata.org/sZEtYbpsu
Currently I’m not sure why the $each
-function in the json_transformation
can even get increasing amount of arguments on every iteration. In these tests I would expect that the resulting json would be written to the file the same exact way on every interval. In this case the json is written correctly only on the first iteration, but on every interval after the first it just fails and doesn’t append anything to the file.
Is Telegraf caching the previous metrics in some weird way or is the JSONata purely to blame? There is also possibility for user error so that might be the reason too