Querying large datasets

We are trying to query a large dataset of financial market data. The dataset contains roughly 700 million rows.

We have one measurement:

  • tick

Each row has two tags:

  • feedType (all rows have the same value ‘SIM’)
  • securityId (there are about 1500 different security ids)

Each row has the following fields (all of which floating point numbers):

  • bid
  • ask
  • volBid
  • volAsk
  • last
  • vol

With InfluxDB 1.6 and InfluxQL we have been able to query the entire dataset in about 2 hours, by issuing the following InfluxQB query

SELECT bid,ask FROM tick where feedType='SIM'

Result looks like this:

time                bid  ask
----                ---  ---
1483434000000000000 0    4.39
1483434000000000000 0    14.74
1483434000000000000 0    0
1483434000000000000 0    0
1483434000000000000 0    0
1483434000000000000 0    0
1483434000000000000 0    0
1483434000000000000 0    38
1483434000000000000 47.9 0
1483434000000000000 0    5.67

Since upgrading to InfluxDB 2.3 things do not work anymore. When issuing the same InfluxQL query the system just get’s stuck an no data is returned for hours.

We also tried Flux queries with this query here

from(bucket: "short")
  |> range(start:0)
  |> filter(fn: (r) => r._field == "bid" or r._field == "ask") 
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

But again, no data is returned for hours.

One more problem with Flux queries is that all results are grouped by securityId, so first all timestamps for securityId=1 then all securityId=2. We would however require for results to be returned sorted by timestamp.

Are we doing something completely wrong? What is the right way to querying large amounts of data sorted by timestamp?

Hello @Andy_Flury,
To get rid of the default grouping (data is grouped by how series are defined/stored on disk where each table represents one series).
You’d add the following:

from(bucket: "short")
  |> range(start:0)
  |> filter(fn: (r) => r._field == "bid" or r._field == "ask") 
  |> group()

Why are you pivoting? I wouldn’t pivot data unless you want to perform math across the fields at the same timestamp.

What HW are you running Influx on? Why do you need to query the entire dataset?

Also learning about what community members are doing with Influx makes my day, do you mind sharing a little more info with me?

Thank you!

Thank yo so much @Anaisdg!

We are running Influx on a Windows Server with 8 Cores and 32 GB of Memory.

We are using the recorded historical tick data to perform financial back tests of automated trading strategies. So we basically feed the entire historical data set through the trading strategies to find out how they would have behaved in the past.

Note: For this particular test we are requiring data of all instruments (i.e. all securityId’s) but for other tests we might only use a subset (i.e. some securityId’s).

Regarding your suggestion:
When I try this the output looks as follows.

                   _start:time                      _stop:time                      _time:time                  _value:float           _field:string     _measurement:string         feedType:string       securityId:string
------------------------------  ------------------------------  ------------------------------  ----------------------------  ----------------------  ----------------------  ----------------------  ----------------------
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T12:00:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T13:06:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T13:07:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T13:46:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:18:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:22:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:28:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:29:00.000000000Z                             0                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:30:00.000000000Z                         32.91                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:31:00.000000000Z                         32.17                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:32:00.000000000Z                         32.14                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:33:00.000000000Z                         32.11                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:34:00.000000000Z                         32.11                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:35:00.000000000Z                         32.11                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:36:00.000000000Z                         32.35                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:37:00.000000000Z                         32.24                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:38:00.000000000Z                         32.17                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:39:00.000000000Z                         32.14                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:40:00.000000000Z                         32.12                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:41:00.000000000Z                         32.07                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:42:00.000000000Z                         32.14                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:43:00.000000000Z                         32.17                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:44:00.000000000Z                         32.17                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:45:00.000000000Z                         32.17                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:46:00.000000000Z                         32.11                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:47:00.000000000Z                         32.04                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:48:00.000000000Z                         32.08                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:49:00.000000000Z                         32.08                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:50:00.000000000Z                         32.08                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:51:00.000000000Z                         32.07                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:52:00.000000000Z                         32.03                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:53:00.000000000Z                         32.03                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:54:00.000000000Z                         32.03                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:55:00.000000000Z                         32.05                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:56:00.000000000Z                         32.04                     ask                    tick                     SIM               100306421
1970-01-01T00:00:00.000000000Z  2022-08-19T18:09:16.604419800Z  2017-02-01T14:57:00.000000000Z                         32.03                     ask                    tick                     SIM               100306421

So basically it prints all ask field values for the first securityId, then it prints all ask values for the second securityId.

For our case we need all field values (for all tags) for a particular timestamp, feed it through the trading strategy, then move on to the next timestamp and do the same thing over again.

And regarding your other question, why are we pivoting. We basically need to get all fields for a particular timestamp (and tag) to populate an object within our Java code.

Thanks for your help!
Andy

There are multiple field values associated with the same timestamp and the intention is to receive these values and the timestamp as one row of a table.

Without pivot each field value is delivered as a row with its own copy of the same timestamp, as well as its field name. Which seems most wasteful and awkward to ingest.

Why whould one ever not want to pivot for this use case?

Versions above 2.0.7 seem to unusable because of queries getting stuck forever.

There is a bug report for that 🔥 influxd "context canceled" error, queries never complete 🔥 · Issue #23907 · influxdata/influxdb · GitHub