Parsing JSON formatted log files using Telegraf

I’m currently working on a project that has about a dozen or so log files formatted using JSON.

I’ve also taken a look at the documentation on Github which explains the following example:

    "a": 5,
    "b": {
        "c": 6
    "ignored": "I'm a string"

Which gets translated into the following fields of a measurement: myjsonmetric a=5,b_c=6

Each log file has quite a few objects (60 minutes worth), and each object has a bunch of strings and numbers, as you can see from the example below:

  "msg":"Malware Hash Registry Detection rate: 11%  Last seen: 2017-06-28 22:10:06","sub":"",

Ideally I’d like to be able to parse the JSON log file (eventually all log files) and then store their contents in InfluxDB and visualize them using Grafana. I’d like to be able to

I have everything setup properly, minus the actual parsing of the log files. I wanted to get some thoughts and considerations from the community on the best approach to take for this task.

It’s my first time parsing log files using this software stack (Telegraf, InfluxDB, Grafana, etc). I have some experience using the Elastic Stack, but would rather use Influx’s offering since it’s not using JVM. :sweat_smile:

If I’m missing any details, please let me know. Thanks!

Sounds like you are on the right path to me. JSON format is not quite as nice as line protocol for parsing, but it should be able to get the job done so long as you are not planning to insert strings fields.

Hi @daniel, thanks for your reply!

There are quite a few strings that I’ll be parsing from each JSON log, and subject, respectively. I’m still a little confused on whether using JSON is going to fit my requirements or not, though I’m leaning more towards the latter.

Would you recommend that I just use grok to extract the log file contents, or do you think it’s possible using the JSON parser?

I also have the option to output the logs as tab-separated ASCII files, though I’d prefer using JSON as it’s much easier to read and work with in the long run. I’ve included an example below:

#separator \x09
#set_separator ,
#empty_field (empty)
#unset_field -
#path conn
#open 2013-01-01-00-00-01
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents
#types time string addr port addr port enum string interval count count string bool count string count count count count table[string]

As you can see, we have the following fields (and their types, listed above):

  1. ts
  2. uid
  3. id.orig_h
  4. id.orig_p
  5. id.resp_h
  6. id.resp_p
  7. proto
  8. service
  9. duration
  10. orig_bytes
  11. resp_bytes
  12. conn_state
  13. local_orig
  14. missed_bytes
  15. history
  16. orig_pkts
  17. orig_ip_bytes
  18. resp_pkts
  19. resp_ip_bytes
  20. tunnel_parents

To optimize performance, I’m thinking that I should probably define this in the parser statically.

I would use the JSON parser if you are happy with how it parses the logs, as it’s much easier to setup. Again, the big downside is it has a fixed parsing method and you cannot create string fields (tags work though). If you wanted to, for instance, store the msg field from your earlier data, then I wouldn’t try to use the JSON parser.

If you need something more flexible, and you only other option is tab separated, then you should be able to use logparser with grok patterns to get the data exactly how you like. Logparser does take a bit to figure out and can be somewhat frustrating to debug.

Yeah, I’ve spent a fair amount of time figuring out the correct grok patterns. Using some logstash config patterns which match the output has been less than successful.

I’ll post an update once I figure something out for others to use.