Is InfluxDB suitable for storing trade data?

Hi there. I’m new to InfluxDB, and I’m trying to determine if it would be a good fit for my use-case, storing trade data from cryptocurrency exchanges.

Here are two examples of the kind of data I’d like to store:
https://poloniex.com/public?command=returnTradeHistory&currencyPair=ETH_GNT
https://api.liqui.io/api/3/trades/gnt_eth

And here’s an example of how I’m currently storing this in InfluxDB:

trade,exchange=polo,base=eth,quote=gnt,side=buy price=0.00197600,volume=1.46874329 1496698706000
trade,exchange=liqu,base=eth,quote=gnt,side=sell price=0.00209707,volume=91.00652896 1496701451000

Once I have the trade data I want to reduce it to candlestick data (open, high, low, close).

I have two concerns:

  1. How do I prevent adding duplicate trades? This data will be regularly scraped and thrown at the DB, so I think I will need a way to “upsert” trades based on some kind of unique ID (maybe an MD5 hash). I’ve found this issue on Github, but being new to InfluxDB the solution there isn’t clear to me.
  2. With how I’m currently storing this data, I think trades with the same tags and timestamp will be overwritten. So if there were two trades on the same exchange with the same base/quote that happened at the exact same time, only one of them would be stored. Is that right? And if so, how could I get around that?

Thanks for reading.

@lcurry This is something I’ve been working on lately as well! :slight_smile:

  1. Points written with the same timestamp will be written as last write wins. This is an easy way to ensure there are no duplicates.
  2. The only way to get around that is by adding more metadata like a trade ID to the data.

Does that help?

Thanks @jackzampolin, that does help. It sounds like I’ll get #1 for free then? That’s pretty nice! For #2, should that be stored as a tag or a field?

@lcurry As a tag. This will create additional cardinality which will affect performance in the long run. However we have a feature that is coming with the next release that should help deal with the load.

Another potential way to deal with this information is to use nanosecond precision when writing points. This should almost eliminate the possibility for timestamp collisions. I’ve been pulling in my Coinbase trades into an influx instance and graphing them for my own use to test out schemas. I like the way you have it setup!

Jack,

I am working on a similar case, for which there are approximately 800K to 900K new lines of financial time series daily stored to influxdb. Each point has 7 tags, 10 fields, and the same timestamp representing the date that this end of day data pertains to. I.e. the timestamp changes each day and represents the closing date of US markets, e.g. 2017-06-05 16:00:00.

The retention policy is set to 30 days, so eventually we will count close to 27 million points in the the database.

Based off these high level requirements, do you foresee a potential compatibility or performance issue?

Concerning the cardinality long term effect on performance, would it make more sense to let influxdb sets the timestamp and keep the closing date in a separate tag?

Using influxdb v1.2

Thanks,
Lionel

I’d rather not create additional cardinality, so using nanoseconds would be better. Unfortunately most of these exchanges use second precision so there would still be decent potential to overwrite trades … however I think I could get around that by using part of the trade ID, price, and/or volume as part of the timestamp.

Just thinking through this trade from Liqui.io:

{
    "type": "ask",
    "price": 0.00196411,
    "amount": 9.10520885,
    "tid": 5565459,
    "timestamp": 1496757093
}

The timestamp is 1496757093 seconds, which would be 1496757093000000000 nanoseconds. Then I could add the trade ID 5565459 as nanoseconds to that to be 1496757093005565459. I think that should remove any possibility for overwriting trades or adding duplicates. Do you see any issues with that approach?

Something else I’ve been thinking about is order books. So the data at this URL for example:
https://poloniex.com/public?command=returnOrderBook&currencyPair=BTC_NXT&depth=10

I would love to store these in InfluxDB too, but since orders are always being added/removed I think that probably wouldn’t work (going against point #2 in the docs here: Design Insights and Tradeoffs in InfluxDB | InfluxData Documentation Archive). Would Redis be a better choice for that data?

Thanks again.

@lcurry you could add a tag for the stage of the order. Then you could construct queries to just look at WHERE "status" = 'executed' orders. In that case you might want to store the trade ID as a tag so that you could look at an individual trade and see how long it took to change status. Lots of cool stuff you could do there.

The only issue I can see with the above is if you run into an exchange that doesn’t have a numeric trade ID then you would need to find a slightly different way to add that randomness.

Hey guys, sorry to bump such an old topic but this was high in the search results for using Influx with market data and I wanted to point out a “gotcha” that crops up when looking at exchange trade data, in case others are using Influx for this purpose as well.

I’ve seen several exchanges (Coinbase, Bitfinex, Bitso, and some others that I don’t remember) that report trade timestamps out of order, and they’ve each advised me to use sequential time (trade ids) instead of reported timestamps to order the trades.

So be careful with the timestamps you’re attaching to these trades — if you’re listening to a websocket you might want to consider just using the time that you observed the trade as its timestamp.

Hello, For storing trade data or stock market data, all trades are happend at market hours only (for india, 9:15 to 15:30) every day. How to handle this in Influxdb?

I ask because, recently I face a problem of using aggregateWindow with {createEmpty: true} to aggragte a data for 5m,15m etc… The aggregate function aggrgate the values properly But create emptyRecords for time period of whole day. But I want create empty records for only within market hours(if no trade happens).

Is there any way limit such functions to apply only for specific time range?