Job Scheduling for periodic web API scraping into InfluxDB

Hi there,

I have just started with my InfluxData journey, so am still very new. I have some time-series that I am scraping on a few web APIs. The goal of my workflow is the following:

  1. Every hour, trigger GET calls to several web APIs, receive JSONs response.
  2. Convert JSONs into a Tabular format.
  3. Write into a staging Time-series bucket for API.
  4. Use Flux to aggregate these time-series buckets and upload them into a final, refined bucket.
  5. Run various analytics or ML models on that final bucket to gain insight.

My question is two-fold:

  1. Do InfluxDB buckets accept JSON records as is, or do we need to further preprocess the JSON record in order to get accepted? For example, turning it into a tabular row with the timestamp as the primary key. The doc isn’t very clear about whether there is any schema requirement for the bucket:
    Create a bucket in InfluxDB | InfluxDB OSS 2.0 Documentation
  2. Can I utilize any Influx tool, such as Telegraf agent and Kapacitor, where I can deploy my Python (or Flux) script to kick off and make these hourly API calls into InfluxDB? Or do I have to use my own job scheduler (i.e. SSIS), and then kick-off my script and call the Influx API to “write” to the DB?

Thanks,
~Kevin

Hi Kevin

You can write to InfluxDB using the LineProtocol. Find more about the format in the docs.

Also you can dump the data directly from any Block Storage ( locally or in the cloud ) using Influx-CLI.

For the second question : Kapacitor has all the functionality you need, and its well integrated with the TICK stack.