Parsing JSON formatted log files using Telegraf

josephbleroy · September 5, 2017, 11:27am

I’m currently working on a project that has about a dozen or so log files formatted using JSON.

I’ve also taken a look at the documentation on Github which explains the following example:

{
    "a": 5,
    "b": {
        "c": 6
    },
    "ignored": "I'm a string"
}

Which gets translated into the following fields of a measurement: myjsonmetric a=5,b_c=6

Each log file has quite a few objects (60 minutes worth), and each object has a bunch of strings and numbers, as you can see from the example below:

{
  "ts":1498571047.747265,
  "uid":"CESM5I2V6z8LNDRPpb",
  "id.orig_h":"192.168.1.96",
  "id.orig_p":49190,
  "id.resp_h":"145.222.222.222",
  "id.resp_p":80,
  "fuid":"FfmePA13Dx2LcgCLd",
  "file_mime_type":"application/x-dosexec",
  "file_desc":"http://redacted/file.exe",
  "proto":"tcp","note":"TeamCymruMalwareHashRegistry::Match",
  "msg":"Malware Hash Registry Detection rate: 11%  Last seen: 2017-06-28 22:10:06","sub":"https://www.virustotal.com/en/search/?query=555",
  "src":"192.168.1.96",
  "dst":"145.222.222.222",
  "p":80,
  "peer_descr":"blip",
  "actions":["Notice::ACTION_LOG"],
  "suppress_for":3600.0,
  "dropped":false
}

Ideally I’d like to be able to parse the JSON log file (eventually all log files) and then store their contents in InfluxDB and visualize them using Grafana. I’d like to be able to

I have everything setup properly, minus the actual parsing of the log files. I wanted to get some thoughts and considerations from the community on the best approach to take for this task.

It’s my first time parsing log files using this software stack (Telegraf, InfluxDB, Grafana, etc). I have some experience using the Elastic Stack, but would rather use Influx’s offering since it’s not using JVM.

If I’m missing any details, please let me know. Thanks!

daniel · September 5, 2017, 11:31pm

Sounds like you are on the right path to me. JSON format is not quite as nice as line protocol for parsing, but it should be able to get the job done so long as you are not planning to insert strings fields.

josephbleroy · September 6, 2017, 12:36am

Hi @daniel, thanks for your reply!

There are quite a few strings that I’ll be parsing from each JSON log, and subject, respectively. I’m still a little confused on whether using JSON is going to fit my requirements or not, though I’m leaning more towards the latter.

Would you recommend that I just use grok to extract the log file contents, or do you think it’s possible using the JSON parser?

I also have the option to output the logs as tab-separated ASCII files, though I’d prefer using JSON as it’s much easier to read and work with in the long run. I’ve included an example below:

#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path conn
#open 2013-01-01-00-00-01
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents
#types time string addr port addr port enum string interval count count string bool count string count count count count table[string]

As you can see, we have the following fields (and their types, listed above):

ts
uid
id.orig_h
id.orig_p
id.resp_h
id.resp_p
proto
service
duration
orig_bytes
resp_bytes
conn_state
local_orig
missed_bytes
history
orig_pkts
orig_ip_bytes
resp_pkts
resp_ip_bytes
tunnel_parents

To optimize performance, I’m thinking that I should probably define this in the parser statically.

daniel · September 6, 2017, 1:20am

I would use the JSON parser if you are happy with how it parses the logs, as it’s much easier to setup. Again, the big downside is it has a fixed parsing method and you cannot create string fields (tags work though). If you wanted to, for instance, store the msg field from your earlier data, then I wouldn’t try to use the JSON parser.

If you need something more flexible, and you only other option is tab separated, then you should be able to use logparser with grok patterns to get the data exactly how you like. Logparser does take a bit to figure out and can be somewhat frustrating to debug.

josephbleroy · September 6, 2017, 1:28am

Yeah, I’ve spent a fair amount of time figuring out the correct grok patterns. Using some logstash config patterns which match the output has been less than successful.

I’ll post an update once I figure something out for others to use.

Topic		Replies	Views
Json file > line protocol > influxdb Telegraf	6	7369	December 19, 2018
Electricity data emitted from json file in Telegraf Telegraf influxdb , telegraf , grafana , time , json	1	200	December 18, 2023
Parse json inputs.http plugin of Telegraf influxdb , telegraf	0	1249	July 1, 2020
How to Parse JSON Data using telegraf and push to influxdb	5	4645	July 27, 2020
Monitoring dump1090 Telegraf telegraf	7	1082	March 18, 2021

Parsing JSON formatted log files using Telegraf

Related topics