Telegraf http plugin with json file trouble

I am fairly new to Telegraf trying to consume a json file from this website
https://www.swpc.noaa.gov/products/solar-cycle-progression

The file is https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json
I have configured the Telegraf HTTP plugin and running Telegraf with the config file returns no errors. Validating the Data in the InfluxDB shows totally wrong values when compared with the content of the source json file.

For example, I am looking at the value of “predicted_f10.7” which is “106.2” in the source json. InfluxDB shows

I am sure I am missing something here, any help is much appreciated.
Unfortunately, as a new user, I am not allowed to attach the Telegraf config file here.

Thanks
Roland

Hi @HB9VQQ (Roland) - Welcome to the Community Forum!

The received JSON is usually parsed into InFluxDB Line Protocol. So you should have a [[inputs.http.json_v2]] section within the [[inputs.http]] plugin to do that. Also it’s recommended to set debug = true in the [agent] and send the Telegraf output to stdout using the [[outputs.file]] plugin.

Observe what gets generated and we can go from there. Good luck!

Phill

1 Like

Thanks Phil,

I have modified the Telegraf conf, link to the file https://www.udrop.com/6R2B/predicted_solar_cycle_data_(json)_.conf

Then ran telegraf again and captured stdout to a file

link to output.txt https://www.udrop.com/6R2D/output.txt

Thanks again
Roland

@HB9VQQ

Roland - output.txt looks good to me but you’re the judge. Does Influxdb show the correct data now?

Yes, Phil, InfluxDB gets populated but by looking closely at the Data you’ll see something is not right.

For comparison, I am loading the same source JSON file in online viewer

When looking at the content of Influx I am expecting to see the same but I don’t

I am scratching my head where the heck is the value of “67.73” coming from?

Ok Ok got it, the last array in the JSON file is it

Oh dear… sorry for that…
Is there a way to load ALL the Data from the JSON not only the last entry?
I wonder if that has something to do with “json_time_key” and “json_time_format”?

Thanks
Roland

Hard to fight against a new user’s posting capabilities limitations :slight_smile:

@HB9VQQ
Roland - No worries, what you’ve provided is sufficient. It’s got to be in the parsing logic of [[inputs.http]]. Your original Influx screenshot shows it’s always picking off that last value.

So it looks like you get a JSON array of objects and want to pick-off the name-value pair corresponding to "predicted_f10.7" for every object in that array, yes? Here are a couple of references specifically about doing that with JSON: How to Parse JSON and The JSON Playground where you can test/refine your parsing logic to make sure it’s working as desired.

Parsing is tricky, especially syntax. Watch out for single [ vs double [[ square brackets & indentation. I think you’re closing in on the solution, but we’re here if you need help.

Phill

Ok Phil, I am on it,

So it looks like you get a JSON array of objects and want to pick-off the name-value pair corresponding to "predicted_f10.7" for every object in that array, yes?

Exactly what I am trying to do, if it works I also need some other keys-values from the file.But step by step for now

Notmuch success so far, that’s what I appended to the configuration

## Data format to consume.
  ## Each data format has its own unique set of configuration options, read
  ## more about them here:
  ## https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md
  data_format = "json_v2"

  [[inputs.file.json_v2]]
       measurement_name = "SWPC"
       timestamp_path = "time-tag" 
       timestamp_format = ""
       [[inputs.file.json_v2.object]]
           path = "@this" 
           timestamp_key = "time-tag" 
           timestamp_format = "" 
           disable_prepend_keys = true

Having difficulties defining the “timestamp_format”

image

I keep pushing
Roland

@HB9VQQ
The syntax/indentation looks good. You might try something simple like:

  [[inputs.file.json_v2]]
       measurement_name = "SWPC"
       [[inputs.file.json_v2.object]]
           path = "@this" 
           included_keys = ["time-tag", "predicted_f10.7"]

which should pull out those two name-value pairs. Get something really simple working and build up from there.

Ok just did that and ran telegraf

not much happening on the cmd line just yet with the current .conf
predicted_solar_cycle_data_(json)__v2.txt (8.2 KB)

@HB9VQQ I yanked the first two objects from the input file and tested in the playground

@this.#.predicted_f10\.7 gives [103.5,104.4] and
@this.#.time-tag gives ["2022-01","2022-02"]

Indeed I tried the same for the entire json and it works in the Playground, all returned values are ok. Running telegraf not returning anything on the cmd line.

@HB9VQQ

It’s working on my machine. I ran it using a file as input and then switched to the URL. Both cases ran.

Halfway through this thread we switched from [[inputs.http]] to [[inputs.file]] which I didn’t catch until testing on my machine. The conf file is:

 data_format = "json_v2"
  [[inputs.http.json_v2]]
       measurement_name = "SWPC"
       [[inputs.http.json_v2.object]]
           path = "@this" 

and the tail end of the output measurements produced is:

SWPC,host=dd5890180e41,url=https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json low_ssn=0,predicted_f10.7=67.8,high_f10.7=76.8,low_f10.7=67.7,time-tag="2040-06",predicted_ssn=0.1,high_ssn=10.1 1658075170000000000
SWPC,host=dd5890180e41,url=https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json time-tag="2040-07",predicted_ssn=0,high_ssn=9,low_ssn=0,predicted_f10.7=67.73,high_f10.7=8,low_f10.7=67.7 1658075170000000000
SWPC,host=dd5890180e41,url=https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json predicted_f10.7=67.73,high_f10.7=8,low_f10.7=67.7,time-tag="2040-08",predicted_ssn=0,high_ssn=9,low_ssn=0 1658075170000000000
SWPC,host=dd5890180e41,url=https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json low_ssn=0,predicted_f10.7=67.73,high_f10.7=8,low_f10.7=67.7,time-tag="2040-09",predicted_ssn=0,high_ssn=9 1658075170000000000
SWPC,host=dd5890180e41,url=https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json predicted_ssn=0,high_ssn=9,low_ssn=0,predicted_f10.7=67.73,high_f10.7=8,low_f10.7=67.7,time-tag="2040-10" 1658075170000000000
SWPC,host=dd5890180e41,url=https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json low_f10.7=67.7,time-tag="2040-11",predicted_ssn=0,high_ssn=9,low_ssn=0,predicted_f10.7=67.73,high_f10.7=8 1658075170000000000
SWPC,host=dd5890180e41,url=https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json high_f10.7=8,low_f10.7=67.7,time-tag="2040-12",predicted_ssn=0,high_ssn=9,low_ssn=0,predicted_f10.7=67.73 1658075170000000000
2022-07-17T16:26:15Z D! [agent] Stopping service inputs
2022-07-17T16:26:15Z D! [agent] Input channel closed
2022-07-17T16:26:15Z I! [agent] Hang on, flushing any cached metrics before shutdown
2022-07-17T16:26:15Z D! [outputs.file] Buffer fullness: 0 / 10000 metrics
2022-07-17T16:26:15Z I! [agent] Stopping running outputs

I literally looked at it a thousand times and didn’t see it. Thanks for pointing it out. Now it starts working…

Thank you so much!

Now one step further. If you look closely at each single line of the output you’ll see sometimes the order is not “time-tag”, predicted_f10.7" but rather “predicted_f10.7, time-tag”. Also each output line has the same timestamp which results in just a single entry in the InfluxDB.

I guess I need something like

[

[inputs.http]]
  ## One or more URLs from which to read formatted metrics
  urls = [
    "https://services.swpc.noaa.gov/json/solar-cycle/predicted-solar-cycle.json"
  ]
  data_format = "json_v2"
  [[inputs.http.json_v2]]
       measurement_name = "SWPC_predicted"
       [[inputs.http.json_v2.object]]
           path = "@this" 
           included_keys = ["predicted_f10.7"]
           timestamp_key = "time-tag"
           timestamp_format = ""

The format of the “time-tag” is “YYYY-MM” any clue how to construct the “timestamp_format” for this?

Solved with

[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout"]
  data_format = "influx"
  influx_sort_fields = true

@HB9VQQ Yeah, in my experience the order sometimes switches around, but InfluxDB unscrambles it.

Regarding timestamp, I found this example though it’s for unix. It can be done with the Starlark processor plugin, but there’s probably an easier way. You might look at the enum processor. And depending on your plans once it’s in InfluxDB, you might be able to query directly off the time-tag field.

When you figure it out, I’d be interested in the hearing about it.

Phill

If there was an easy way to append “-01” at the end of every “time-tag” then I probably could be using

timestamp_key = "time-tag"
timestamp_format = "2006-01-02"

After a couple of hours of research, there is no joy just yet. It seems to be rather complex getting “time-tag” converted into a UNIX timestamp sent to InfluxDB for my use case or I just overlook something.

Thanks for all your help Phil!

@HB9VQQ Starlark is a dialect of Python that supports string concatenation. Essentially each measurement leaves [[inputs.http]] and passes through [[processors.starlark]] where the string in YYYY-MM format would be appended with “-01” (or any other manipulation you’d like). This modified metric then heads for InFluxDB.

Point is: There would be a small learning curve and you’d be writing a couple lines of Starlark code which are embedded in that plugin. I’ve done this and written up a forthcoming blog post, so I can help if you decide to go that route.

In the meantime I have changed the Telegraf config from http to file,

[[outputs.file]]
  ## Files to write to, "stdout" is a specially handled file.
  files = ["stdout"]
  data_format = "influx"
  influx_sort_fields = true
# Read formatted metrics from one or more HTTP endpoints
[[inputs.file]]
  ## Files to parse each interval.  Accept standard unix glob matching rules,
  ## as well as ** to match recursive files and directories.
  files = ["./predicted-solar-cycle.json"]
  data_format = "json_v2"
  [[inputs.file.json_v2]]
       measurement_name = "SWPC_predicted"
       [[inputs.file.json_v2.object]]
           path = "@this"
           included_keys = ["predicted_f10.7", "predicted_ssn"]
           timestamp_key = "time-tag"
           timestamp_format = "2006-01-02"

downloaded and edited the source JSON, basically changed “time-tag” to “YYYY-MM-DD” e.g.

}, {
“time-tag”: “2040-10-01”,
“predicted_ssn”: 0.0,
“high_ssn”: 9.0,
“low_ssn”: 0.0,
“predicted_f10.7”: 67,
“high_f10.7”: 8.0,
“low_f10.7”: 67.7
}, {
“time-tag”: “2040-11-01”,
“predicted_ssn”: 0.0,
“high_ssn”: 9.0,
“low_ssn”: 0.0,
“predicted_f10.7”: 68,
“high_f10.7”: 8.0,
“low_f10.7”: 67.7
}, {
“time-tag”: “2040-12-01”,
“predicted_ssn”: 0.0,
“high_ssn”: 9.0,
“low_ssn”: 0.0,
“predicted_f10.7”: 69,
“high_f10.7”: 8.0,
“low_f10.7”: 67.7
}
]

To see if that makes any difference in the resulting influx unix output timestamp.

All three lines still carrying the same time stamp “1658125916000000000”. Am I’m missing something here?

@HB9VQQ That timestamp is the time when Telegraf ran - Monday, July 18, 2022 6:31:56 AM GMT