Storing stock market data for complex queries

Discordia · October 1, 2023, 12:37pm

Hi, I am looking at influxdb as storage for stock market data. I only want to stor daily timeframe so one entry per stock per day. And about 252 entries per stock per year. There are about 7000 stocks in the US stock market so 7000*252 entries for a year.

I am now storing it with ohlcv data, so each entry has a value for the open price, highest price, lowest price, and close price for that day. And also a value for the volume. Each entrie has the stock ticker as a tag. So I am storing all of them in the same bucket.

What I want to do with the data is to be able to do complex queries, such as:

What stock has had the highest volume during a 10-day period?
which stocks have made the biggest move for a certain timeframe: a week, a month, 3 months, 6 months, a year, etc.
Which stocks are trading over the 10/20/50/100/150/200-day moving average for close price
But also even more complex queries than that at times and aggregations with AND/OR of the above.

Questions

Is this possible with influx db? I am looking at custom aggregate functions.
Is the idea to have all of them as tags in one bucket a good idea, or should each stock be in its own bucket? (Note that I will have the same retention for all stocks.)

I am thinking about 3 different ways that I could achieve what I want:

This approach with all in the same bucket and custom aggregate functions
Storing a lot more of pre-calculated values such as the Moving averages and stats per entry to make it easier to query
Just storing the data in influx but reading it all into memory and doing the queries on the data in memory instead.

For 2 and 3 I think I could just use a Postgres database instead, no need for influxdb in that case.

Gene_Pnw · October 1, 2023, 4:30pm

a. your results won’t be accurate without info on splits, dividends, delisted/relisted stocks, one place you could read re:backtest pitfalls is old.reddit.com/r/algotrading

b. the query lang will go so far on e.g. expon moving average, I would prefer backtest lib in python, julia, R

Discordia · October 1, 2023, 8:47pm

I am aware of these problems with splits, dividends, delisted/relisted stocks etc.

Topic		Replies	Views
Correct architecture for storing market data Store influxdb	2	1365	November 10, 2021
Downsampling past data using InfluxDB v2.0 tasks InfluxDB 2 influxdb	2	1007	October 13, 2020
OHLC Data Schema and Queries	0	619	August 27, 2023
Using aggregate functions extensively	0	20	October 31, 2024
Advice on aggregating to ohlc data InfluxDB 2	3	3760	March 8, 2021

Storing stock market data for complex queries

Related topics