Stock Data Shard Duration Best Practice

feedback
time-series
influxdb
#1

Hey everyone, I’ve been back and forth through various forums, and articles on appropriate shard duration for storing stock data. I’ve seen anywhere from keeping defaults of 1week, to setting it to 50 years.

Anyone with experience on the extremes, and anything in the middle? Will having weekly create tons of shards / excessive memory usage? I’d like to get all of this data reimported into a new retention policy before it gets too large.

A. I plan on saving all historical data infinitely.
B. I’m using continuous queries with resampling to downsample data for my graphs at 1w,3d,1d,12h,6h,4h,2h,1h,30m,15m,5m,3m,1m resolutions.
C. Potentially will use realtime queries at smaller <= 15m time intervals at 12 hours per minute worth of history. For example 1m will have 12 hours of data(720 data points), 15 minute will have 180h or 7.5 days(outside of default 1week).

#2

Can you explain more how you’re using the data? Here are some things to consider that would give context.

  1. An example query
  2. Write throughput volume
  3. Number of series
  4. Version of Influx
#3
  1. select count(“stockid”) as VOLUME, (DIFFERENCE(COUNT(“stockid”)) / ((COUNT(“stockid”)) - (DIFFERENCE(COUNT(“stockid”)))) * 100 ) as VOLUME_CHANGE, MEAN(“price”) AS PRICE, (DIFFERENCE(MEAN(“price”)) / ((MEAN(“price”)) - (DIFFERENCE(MEAN(“price”)))) * 100 ) as PRICE_CHANGE, MIN(“price”) as MIN, MAX(“price”) as MAX, FIRST(“price”) AS OPEN, LAST(“price”) AS CLOSE from stock where symbol=‘XXX’ AND time >= now() -12 group by time(1h);

  2. 1-100 at a time write inserts, if that’s what you’re asking.

  3. 400,000-600,000 / day and expecting growth. Cardinality roughly 1500.

  4. 1.4.2

#4

I would recommend a few things:

  1. If it’s possible, increase the batch size to around 10k for higher efficiency.
  2. Since you’re keeping data indefinitely, you could end up with a forever growing number of shards that are never dropped. I would set the shard group duration to something large (like years).

Use cases could change some of these things, but I hope that helps.