Stock Chart Data

zackperdue · March 9, 2017, 7:07pm

Hey guys, looking into influxdb to store bar chart data for stocks.

Specifically, which schema would be best to use?

Right now I’ve got my OHLC and volume in fields, and I’ve got symbol and timeframe as tags. The start time of the bar is the time stored with the point. My measurement is named “bars”. Is this a good schema?

Very new to influxdb so I’m looking for some insights.

Please respond with tips and suggestions. Thanks!

jackzampolin · March 9, 2017, 7:33pm

@zackperdue I would say a schema like this:

measurement

stock_data

tags

market
ticker

fields

high
low
open
close
volume

Just write the timestamp with each point. Then you can GROUP BY time(1d) or whatever interval you need to get your bars. Hope this helps!

Jack

zackperdue · March 9, 2017, 7:35pm

I’m new to Influx, so bare with me. What happens if I try to insert a duplicate point?

Is there something I can look at to learn more about that?

thanks @jackzampolin

jackzampolin · March 9, 2017, 7:36pm

@zackperdue We have some docs on duplicate points. Points are only duplicate if all of the metadata is the same.

zackperdue · March 9, 2017, 7:50pm

Thanks for that @jackzampolin

A point is uniquely identified by the measurement name, tag set, and timestamp. If you submit a new point with the same measurement, tag set, and timestamp as an existing point, the field set becomes the union of the old field set and the new field set, where any ties go to the new field set. This is the intended behavior.

What does it mean the “union”?

And if I request a series, where duplicates exist, does the collection returned include both or does it include the latest inserted, and discard the “overwritten” one, even though they are identical.

My schema actually has a couple more “tags” (i think) correct me if i’m wrong in thinking this.

I’d like to store all of my bar data in the same measurement, but I’d like to support different timeframes/intervals since I would be storing bars that are already summarized to a specific interval (5min, 15min, 30min, etc…)

measurement

stock_data

tags

exchange
ticker
15_minute

fields

high
low
open
close
volume

I added 15_minute there. Is that right?

jackzampolin · March 9, 2017, 8:21pm

@zackperdue Last point wins. So whatever value has come in most reciently will be persisted. I wouldn’t advise storing any time related tags. That is def an anti-pattern. I would just try to use different GROUP BY times.

If your queries become too slow and you are looking to optimize, the right way to do this is to use Continuous Queries to downsample into different Retention policies. There is a guide here

Jack

zackperdue · March 9, 2017, 10:52pm

Thanks @jackzampolin. Looks like i have some reading to do.

zackperdue · March 9, 2017, 11:28pm

One last question, @jackzampolin

I get my data from an API, and i have them already summarized in bars of HOLC with the time and volume.
When I call this api, i specify how i want the bars. 15 minute bars, 30 minute bars, etc…

is it not wise to do a new measurement for each type of bar Im storing?

measurements would be: 15_minutes, 30_minutes, hourly, daily, weekly, monthly

It seems like what is going to work best in my case is either putting all data points under 1 measurement and using the intervals as a tag, or using the intervals as the measurement.

Not sure which is best.

Marijus · March 10, 2017, 7:19am

I think the best way to do it, would be put the data to the db with the smallest time stamp possible, in your case it would be 1 minute? And then to select them with wanted/required interval with “GROUP BY”.

jackzampolin · March 10, 2017, 4:05pm

@zackperdue Well if the data is coming in that way then making the measurement names the different granularities might make sense.

RobHouston · May 4, 2017, 3:22am

What would be the best solution to store tick data and then be able to pull it out in groups. What I mean by groups. Say you have data like this…

xyz-symbol, mid, spread

So now you want to group the data by every 100 ticks. Or say you want to group the data by range of mid 10-20, 20-30, 30-40, 40-50, etc for x time period. Or you just wanted to get OHLC so you could generate 1 min OHLC or 4hr OHLC??

When I try and select data with the following:

select first(mid) as open, max(mid) as high, min(mid) as low, last(mid) as close, sum(mid) as mid, MOVING_AVERAGE(/mid/,5) as ma5, MOVING_AVERAGE(/mid/,99) as ma99 from tick where (time >= now() - 1h) and symbol=‘EURUSD’ group by symbol, time(1m)
I get the following error:
ERR: error parsing query: aggregate function required inside the call to moving_average

If I remove the moving average it works.

How would you create a structure to hold the data in influx where the data could be pulled out?? In PostgreSQL this if very straight forward with tables and group by in selects.

Since I am very new to influxdb like less than a week. I am having real hard time trying to understand how to do the simple things like create a table!!! inserting data into table…

Has anyone on here used influxdb for tick data in financial markets? If you have would you post the layout on how to create the storage in influxdb!

Topic		Replies	Views
Influx DB Data Insertion InfluxDB 2	1	323	January 9, 2024
InfluxDB understanding basics and IoT Schema design InfluxDB 1 influxdb , time-series , influxdata , schema , query	3	2116	November 3, 2021
InfluxDB schema help! InfluxDB 2	1	12	May 21, 2025
Best Schema for Tree-like Datastructure Telegraf	1	510	November 8, 2019
Data Storage Schema Store schema	4	56	August 26, 2024

Stock Chart Data

Related topics