Storing stock market data for complex queries

Hi, I am looking at influxdb as storage for stock market data. I only want to stor daily timeframe so one entry per stock per day. And about 252 entries per stock per year. There are about 7000 stocks in the US stock market so 7000*252 entries for a year.

I am now storing it with ohlcv data, so each entry has a value for the open price, highest price, lowest price, and close price for that day. And also a value for the volume. Each entrie has the stock ticker as a tag. So I am storing all of them in the same bucket.

What I want to do with the data is to be able to do complex queries, such as:

  • What stock has had the highest volume during a 10-day period?
  • which stocks have made the biggest move for a certain timeframe: a week, a month, 3 months, 6 months, a year, etc.
  • Which stocks are trading over the 10/20/50/100/150/200-day moving average for close price
  • But also even more complex queries than that at times and aggregations with AND/OR of the above.

Questions

  1. Is this possible with influx db? I am looking at custom aggregate functions.
  2. Is the idea to have all of them as tags in one bucket a good idea, or should each stock be in its own bucket? (Note that I will have the same retention for all stocks.)

I am thinking about 3 different ways that I could achieve what I want:

  1. This approach with all in the same bucket and custom aggregate functions
  2. Storing a lot more of pre-calculated values such as the Moving averages and stats per entry to make it easier to query
  3. Just storing the data in influx but reading it all into memory and doing the queries on the data in memory instead.

For 2 and 3 I think I could just use a Postgres database instead, no need for influxdb in that case.

a. your results won’t be accurate without info on splits, dividends, delisted/relisted stocks, one place you could read re:backtest pitfalls is old.reddit.com/r/algotrading

b. the query lang will go so far on e.g. expon moving average, I would prefer backtest lib in python, julia, R

I am aware of these problems with splits, dividends, delisted/relisted stocks etc.