Why InfluxDB v2.0 return query one field by one with all information? Why use such inefficient way？

justin_zhang · September 13, 2021, 5:14pm

InfluxDB v2.0 returns query results just one field by one field with all query inforamtion and measurement, tags, why not packed them together like writing? so it cost 300 seconds for just 10,000 records with about 25 fields, the results is 250,000 records! so can do it a little smarter?..

I just want a csv like table or mysql like table returns.

query code:
client = InfluxDBClient(url=“http://localhost:18086”, token=token, org=org)
query_api = client.query_api()

#result = query_api.query(org=org, query=query)
#result = query_api.query_stream(org=org, query=query)
result = query_api.query_csv(query)

return:
the 16827 record: FluxRecord() table: 78, {‘result’: ‘_result’, ‘table’: 78, ‘_start’: datetime.datetime(2021, 9, 11, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_stop’: datetime.datetime(2021, 9, 13, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_time’: datetime.datetime(2021, 9, 13, 5, 42, 44, 780000, tzinfo=tzutc()), ‘_value’: 7000, ‘_field’: ‘ask1_vol’, ‘_measurement’: ‘510050’, ‘mkt’: ‘SH’, ‘status’: ‘T’, ‘type’: ‘S’}
the 16828 record: FluxRecord() table: 78, {‘result’: ‘_result’, ‘table’: 78, ‘_start’: datetime.datetime(2021, 9, 11, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_stop’: datetime.datetime(2021, 9, 13, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_time’: datetime.datetime(2021, 9, 13, 5, 42, 47, 780000, tzinfo=tzutc()), ‘_value’: 83800, ‘_field’: ‘ask1_vol’, ‘_measurement’: ‘510050’, ‘mkt’: ‘SH’, ‘status’: ‘T’, ‘type’: ‘S’}
the 16829 record: FluxRecord() table: 78, {‘result’: ‘_result’, ‘table’: 78, ‘_start’: datetime.datetime(2021, 9, 11, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_stop’: datetime.datetime(2021, 9, 13, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_time’: datetime.datetime(2021, 9, 13, 5, 42, 50, 790000, tzinfo=tzutc()), ‘_value’: 225500, ‘_field’: ‘ask1_vol’, ‘_measurement’: ‘510050’, ‘mkt’: ‘SH’, ‘status’: ‘T’, ‘type’: ‘S’}

InfluxDB main: 2021-09-14 01:21:25.959709
cost 5.21 seconds setup streams and 216.81 seconds to download 2510664 results

Anaisdg · September 13, 2021, 7:26pm

Hello @justin_zhang,
The python client returns flux tables by default. These tables contain the records. You would have to follow the example in the docs and iterate through the table objects to get the values like so:

csv_result = query_api.query_csv('from(bucket:"my-bucket") |> range(start: -10m)')
val_count = 0
for row in csv_result:
    for cell in row:
        val_count += 1

Does that help?

justin_zhang · September 14, 2021, 1:38am

Dear Anaisdg,
I tried such query already, it’s also same to other query options, the key issue of influxdb 2.0 opensource is that the query result transfered one field by one field, each field with full information of measurements, so it caused very low efficient ! e.g. my table has over 25 fields in each point, so such query cost over 20 times transmission data amount comparied with v1.7 or earlier version.

The result:
InfluxDB main: 2021-09-14 02:45:44.083289
cost 5.20 seconds setup streams and 232.56 seconds to download 2506632 results
I have tried almost all the ways to query:

*#client = InfluxDBClient(url="http://101.6.96.160:18086", token=token, org=org)*

client = InfluxDBClient(url=“http://localhost:18086”, token=token, org=org)*
query_api = client.query_api()*
#result = query_api.query(org=org, query=query)*
#result = query_api.query_stream(org=org, query=query)*
result = query_api.query_csv(query,*

                            dialect=Dialect(header=False, delimiter=","*

                                           , comment_prefix="#", annotations=[],date_time_format="RFC3339"))*

#print(len(results)*
t1 = time.time()*
i = 0*
‘’'*
for record in result:*

   print('the %d record: %s' % (i, str(record) ) )*

```
   i += 1*
```
‘’'*
for row in result:*
```
   for cell in row:*
```
```
       i += 1*
```
‘’'*
for table in result:*
```
   print(table)*
```
```
   for record in table.records:*
```
```
       print(record.values)*
```
```
       i += 1*
```
‘’'*
t2 = time.time()*
print("cost %.2f seconds setup streams and %.2f seconds to download %d results " % (t1 - t0, t2-t1, i) )*

justin_zhang · September 14, 2021, 1:51am

Can I backup bucket from influxdb v 2.0 open source and recover them in v1.7 ?

justin_zhang · September 14, 2021, 7:26am

Finally, my developer found using query_raw can be much more quick

Anaisdg · September 14, 2021, 2:29pm

Hello @justin_zhang,
I’m glad that worked for you!
You might also consider using the fieldsAsCol() function in your query or the pivot() function in your query. I found this very useful when using the pandas query method.

Topic		Replies	Views
Querying data that has more than 1 field InfluxDB 2 query , flux	4	263	February 19, 2024
Querying from InfluxDB 2.1 InfluxDB 2 time , query , flux , c-sharp	0	588	January 6, 2022
Multiple fields returned in single table from pandas InfluxDB 2 influxdb , flux , pandas	0	727	August 12, 2021
Any Enhancement for InfluxDB Query Performance now? InfluxDB 2 query , performance	1	318	July 24, 2023
Query times for "load all the data" queries Store influxdb , schema , influxql	10	4809	May 4, 2017

Why InfluxDB v2.0 return query one field by one with all information? Why use such inefficient way？

Related topics