Is InfluxDB suitable for storing trade data?

#1

Hi there. I’m new to InfluxDB, and I’m trying to determine if it would be a good fit for my use-case, storing trade data from cryptocurrency exchanges.

Here are two examples of the kind of data I’d like to store:
https://poloniex.com/public?command=returnTradeHistory&currencyPair=ETH_GNT
https://api.liqui.io/api/3/trades/gnt_eth

And here’s an example of how I’m currently storing this in InfluxDB:

trade,exchange=polo,base=eth,quote=gnt,side=buy price=0.00197600,volume=1.46874329 1496698706000
trade,exchange=liqu,base=eth,quote=gnt,side=sell price=0.00209707,volume=91.00652896 1496701451000

Once I have the trade data I want to reduce it to candlestick data (open, high, low, close).

I have two concerns:

  1. How do I prevent adding duplicate trades? This data will be regularly scraped and thrown at the DB, so I think I will need a way to “upsert” trades based on some kind of unique ID (maybe an MD5 hash). I’ve found this issue on Github, but being new to InfluxDB the solution there isn’t clear to me.
  2. With how I’m currently storing this data, I think trades with the same tags and timestamp will be overwritten. So if there were two trades on the same exchange with the same base/quote that happened at the exact same time, only one of them would be stored. Is that right? And if so, how could I get around that?

Thanks for reading.

#2

@lcurry This is something I’ve been working on lately as well! :slight_smile:

  1. Points written with the same timestamp will be written as last write wins. This is an easy way to ensure there are no duplicates.
  2. The only way to get around that is by adding more metadata like a trade ID to the data.

Does that help?

#3

Thanks @jackzampolin, that does help. It sounds like I’ll get #1 for free then? That’s pretty nice! For #2, should that be stored as a tag or a field?

#4

@lcurry As a tag. This will create additional cardinality which will affect performance in the long run. However we have a feature that is coming with the next release that should help deal with the load.

Another potential way to deal with this information is to use nanosecond precision when writing points. This should almost eliminate the possibility for timestamp collisions. I’ve been pulling in my Coinbase trades into an influx instance and graphing them for my own use to test out schemas. I like the way you have it setup!

#5

Jack,

I am working on a similar case, for which there are approximately 800K to 900K new lines of financial time series daily stored to influxdb. Each point has 7 tags, 10 fields, and the same timestamp representing the date that this end of day data pertains to. I.e. the timestamp changes each day and represents the closing date of US markets, e.g. 2017-06-05 16:00:00.

The retention policy is set to 30 days, so eventually we will count close to 27 million points in the the database.

Based off these high level requirements, do you foresee a potential compatibility or performance issue?

Concerning the cardinality long term effect on performance, would it make more sense to let influxdb sets the timestamp and keep the closing date in a separate tag?

Using influxdb v1.2

Thanks,
Lionel

#6

I’d rather not create additional cardinality, so using nanoseconds would be better. Unfortunately most of these exchanges use second precision so there would still be decent potential to overwrite trades … however I think I could get around that by using part of the trade ID, price, and/or volume as part of the timestamp.

Just thinking through this trade from Liqui.io:

{
    "type": "ask",
    "price": 0.00196411,
    "amount": 9.10520885,
    "tid": 5565459,
    "timestamp": 1496757093
}

The timestamp is 1496757093 seconds, which would be 1496757093000000000 nanoseconds. Then I could add the trade ID 5565459 as nanoseconds to that to be 1496757093005565459. I think that should remove any possibility for overwriting trades or adding duplicates. Do you see any issues with that approach?

Something else I’ve been thinking about is order books. So the data at this URL for example:
https://poloniex.com/public?command=returnOrderBook&currencyPair=BTC_NXT&depth=10

I would love to store these in InfluxDB too, but since orders are always being added/removed I think that probably wouldn’t work (going against point #2 in the docs here: https://docs.influxdata.com/influxdb/v1.2/concepts/insights_tradeoffs/). Would Redis be a better choice for that data?

Thanks again.

#7

@lcurry you could add a tag for the stage of the order. Then you could construct queries to just look at WHERE "status" = 'executed' orders. In that case you might want to store the trade ID as a tag so that you could look at an individual trade and see how long it took to change status. Lots of cool stuff you could do there.

The only issue I can see with the above is if you run into an exchange that doesn’t have a numeric trade ID then you would need to find a slightly different way to add that randomness.