Downsampling data in-situ

influxdb
#1

My client’s data arrives from IoT devices every ten seconds. We would like to set up a downsampling and retention policy that downsamples data to X second intervals after Y minutes and deletes it after a year. (I’m checking on X and Y, but that’s probably not relevant.)

I’ve read the Downsampling and Data Retention guide and I understand how to retain the data for a year. But downsampling leaves me with two questions:

  • For downsampling we need to thin the existing measurement rather than creating a new measurement (…because we don’t control the client that subscribes to the database). What’s the best way to do that?
  • The Downsampling and Data Retention guide explains why it’s necessary to set up a Continuous Query before creating the database. In our case, we need to apply the thinning post-facto once before a CQ can take over the job. How is that done?
#2

I’ve looked into this quite a bit and I finally decided to do it with a Kapacitor script that I replay on historical data via CLI, such as:

kapacitor replay-live batch -task data_rollup_1m -start 2018-01-01T00:00:00Z -stop 2018-05-28T03:00:00Z -rec-time

The data_rollup_1m TICK script queries and aggregates data in batches of 1m and writes it out to another retention policy and measurement.

CQs as far as I understand are meant to deal with data streaming into InfluxDB in a more real-time manner, i.e. as you said, on future data coming in.