Stock Chart Data

Hey guys, looking into influxdb to store bar chart data for stocks.

Specifically, which schema would be best to use?

Right now I’ve got my OHLC and volume in fields, and I’ve got symbol and timeframe as tags. The start time of the bar is the time stored with the point. My measurement is named “bars”. Is this a good schema?

Very new to influxdb so I’m looking for some insights.

Please respond with tips and suggestions. Thanks!

@zackperdue I would say a schema like this:

  1. measurement
  • stock_data
  1. tags
  • market
  • ticker
  1. fields
  • high
  • low
  • open
  • close
  • volume

Just write the timestamp with each point. Then you can GROUP BY time(1d) or whatever interval you need to get your bars. Hope this helps!

Jack

I’m new to Influx, so bare with me. What happens if I try to insert a duplicate point?

Is there something I can look at to learn more about that?

thanks @jackzampolin

@zackperdue We have some docs on duplicate points. Points are only duplicate if all of the metadata is the same.

Thanks for that @jackzampolin

A point is uniquely identified by the measurement name, tag set, and timestamp. If you submit a new point with the same measurement, tag set, and timestamp as an existing point, the field set becomes the union of the old field set and the new field set, where any ties go to the new field set. This is the intended behavior.

What does it mean the “union”?

And if I request a series, where duplicates exist, does the collection returned include both or does it include the latest inserted, and discard the “overwritten” one, even though they are identical.

My schema actually has a couple more “tags” (i think) correct me if i’m wrong in thinking this.

I’d like to store all of my bar data in the same measurement, but I’d like to support different timeframes/intervals since I would be storing bars that are already summarized to a specific interval (5min, 15min, 30min, etc…)

  1. measurement
  • stock_data
  1. tags
  • exchange
  • ticker
  • 15_minute
  1. fields
  • high
  • low
  • open
  • close
  • volume

I added 15_minute there. Is that right?

@zackperdue Last point wins. So whatever value has come in most reciently will be persisted. I wouldn’t advise storing any time related tags. That is def an anti-pattern. I would just try to use different GROUP BY times.

If your queries become too slow and you are looking to optimize, the right way to do this is to use Continuous Queries to downsample into different Retention policies. There is a guide here

Jack

1 Like

Thanks @jackzampolin. Looks like i have some reading to do.

One last question, @jackzampolin

I get my data from an API, and i have them already summarized in bars of HOLC with the time and volume.
When I call this api, i specify how i want the bars. 15 minute bars, 30 minute bars, etc…

is it not wise to do a new measurement for each type of bar Im storing?

measurements would be: 15_minutes, 30_minutes, hourly, daily, weekly, monthly

It seems like what is going to work best in my case is either putting all data points under 1 measurement and using the intervals as a tag, or using the intervals as the measurement.

Not sure which is best.

I think the best way to do it, would be put the data to the db with the smallest time stamp possible, in your case it would be 1 minute? And then to select them with wanted/required interval with “GROUP BY”.

@zackperdue Well if the data is coming in that way then making the measurement names the different granularities might make sense.

What would be the best solution to store tick data and then be able to pull it out in groups. What I mean by groups. Say you have data like this…

xyz-symbol, mid, spread

So now you want to group the data by every 100 ticks. Or say you want to group the data by range of mid 10-20, 20-30, 30-40, 40-50, etc for x time period. Or you just wanted to get OHLC so you could generate 1 min OHLC or 4hr OHLC??

When I try and select data with the following:

select first(mid) as open, max(mid) as high, min(mid) as low, last(mid) as close, sum(mid) as mid, MOVING_AVERAGE(/mid/,5) as ma5, MOVING_AVERAGE(/mid/,99) as ma99 from tick where (time >= now() - 1h) and symbol=‘EURUSD’ group by symbol, time(1m)
I get the following error:
ERR: error parsing query: aggregate function required inside the call to moving_average

If I remove the moving average it works.

How would you create a structure to hold the data in influx where the data could be pulled out?? In PostgreSQL this if very straight forward with tables and group by in selects.

Since I am very new to influxdb like less than a week. I am having real hard time trying to understand how to do the simple things like create a table!!! inserting data into table…

Has anyone on here used influxdb for tick data in financial markets? If you have would you post the layout on how to create the storage in influxdb!