Downsampling past data using InfluxDB v2.0 tasks

Hello,

My company plans to import TBs of past stock prices (2000-2020) into InfluxDB and I’m digging on how to “automatically” get the data aggregated (min/h/day/month open-high-low-close candles) to new buckets when importing those tick prices. Here are 2 straightforward yes / no questions :

  • Can tasks be applied onto past time series when importing ? Despite the amazing doc/community/templates I found, I couldn’t code the right tasks yet
  • If not, can Flux queries handle such aggregation+insert on large buckets ?

I’d appreciate your help.

Best,
Thomas

Hi Thomas -

Seems like a (good) challenge! Be sure to set your schema of tags and fields well too. (tag values should have low cardinality.)

For historical data, I think your best bet is going to be writing a little bit of python code to aggregate at the time of writing. I suspect you’re going to use a client library like python’s to write the historical data? Tasks are not really the right tool here as they are for processing “real-time” data and will likely be frustrating for historical data where the timestamps are old. Flux queries can be used to aggregate the raw data once ingested and flux is good for this. You can write queries that will process a chunk of data, aggregate, and write into a separate bucket. You will need to chunk the flux queries yourself (at least I haven’t seen anything that does this for you). If you try to aggregate on the whole dataset at “once” in one query, I believe the performance will be unsatisfying as it trys to do it all at once instead of chunks.

Let us know how you managed it in the end.

Phil

1 Like

Hi Phil,

You’re right : the whole aggregate at once for ~ 3.10⁶ entries makes the 16GB RAM test machine crash. I definitely need to aggregate at the time of writing !

Thank you very much for your help.

Tom

1 Like