Why InfluxDB v2.0 return query one field by one with all information? Why use such inefficient way?

InfluxDB v2.0 returns query results just one field by one field with all query inforamtion and measurement, tags, why not packed them together like writing? so it cost 300 seconds for just 10,000 records with about 25 fields, the results is 250,000 records! so can do it a little smarter?..

I just want a csv like table or mysql like table returns.

query code:
client = InfluxDBClient(url=“http://localhost:18086”, token=token, org=org)
query_api = client.query_api()

#result = query_api.query(org=org, query=query)
#result = query_api.query_stream(org=org, query=query)
result = query_api.query_csv(query)

return:
the 16827 record: FluxRecord() table: 78, {‘result’: ‘_result’, ‘table’: 78, ‘_start’: datetime.datetime(2021, 9, 11, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_stop’: datetime.datetime(2021, 9, 13, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_time’: datetime.datetime(2021, 9, 13, 5, 42, 44, 780000, tzinfo=tzutc()), ‘_value’: 7000, ‘_field’: ‘ask1_vol’, ‘_measurement’: ‘510050’, ‘mkt’: ‘SH’, ‘status’: ‘T’, ‘type’: ‘S’}
the 16828 record: FluxRecord() table: 78, {‘result’: ‘_result’, ‘table’: 78, ‘_start’: datetime.datetime(2021, 9, 11, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_stop’: datetime.datetime(2021, 9, 13, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_time’: datetime.datetime(2021, 9, 13, 5, 42, 47, 780000, tzinfo=tzutc()), ‘_value’: 83800, ‘_field’: ‘ask1_vol’, ‘_measurement’: ‘510050’, ‘mkt’: ‘SH’, ‘status’: ‘T’, ‘type’: ‘S’}
the 16829 record: FluxRecord() table: 78, {‘result’: ‘_result’, ‘table’: 78, ‘_start’: datetime.datetime(2021, 9, 11, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_stop’: datetime.datetime(2021, 9, 13, 17, 1, 38, 152518, tzinfo=tzutc()), ‘_time’: datetime.datetime(2021, 9, 13, 5, 42, 50, 790000, tzinfo=tzutc()), ‘_value’: 225500, ‘_field’: ‘ask1_vol’, ‘_measurement’: ‘510050’, ‘mkt’: ‘SH’, ‘status’: ‘T’, ‘type’: ‘S’}

InfluxDB main: 2021-09-14 01:21:25.959709
cost 5.21 seconds setup streams and 216.81 seconds to download 2510664 results

Hello @justin_zhang,
The python client returns flux tables by default. These tables contain the records. You would have to follow the example in the docs and iterate through the table objects to get the values like so:

csv_result = query_api.query_csv('from(bucket:"my-bucket") |> range(start: -10m)')
val_count = 0
for row in csv_result:
    for cell in row:
        val_count += 1

Does that help?

Dear Anaisdg,
I tried such query already, it’s also same to other query options, the key issue of influxdb 2.0 opensource is that the query result transfered one field by one field, each field with full information of measurements, so it caused very low efficient ! e.g. my table has over 25 fields in each point, so such query cost over 20 times transmission data amount comparied with v1.7 or earlier version.

The result:
InfluxDB main: 2021-09-14 02:45:44.083289
cost 5.20 seconds setup streams and 232.56 seconds to download 2506632 results
I have tried almost all the ways to query:

*#client = InfluxDBClient(url="http://101.6.96.160:18086", token=token, org=org)*
  • client = InfluxDBClient(url=“http://localhost:18086”, token=token, org=org)*

  • query_api = client.query_api()*

  • #result = query_api.query(org=org, query=query)*

  • #result = query_api.query_stream(org=org, query=query)*

  • result = query_api.query_csv(query,*

  •                             dialect=Dialect(header=False, delimiter=","*
    
  •                                            , comment_prefix="#", annotations=[],date_time_format="RFC3339"))*
    
  • #print(len(results)*

  • t1 = time.time()*

  • i = 0*

  • ‘’'*

  • for record in result:*

  •    print('the %d record: %s' % (i, str(record) ) )*
    
  •    i += 1*
    
  • ‘’'*

  • for row in result:*

  •    for cell in row:*
    
  •        i += 1*
    
  • ‘’'*

  • for table in result:*

  •    print(table)*
    
  •    for record in table.records:*
    
  •        print(record.values)*
    
  •        i += 1*
    
  • ‘’'*

  • t2 = time.time()*

  • print("cost %.2f seconds setup streams and %.2f seconds to download %d results " % (t1 - t0, t2-t1, i) )*

Can I backup bucket from influxdb v 2.0 open source and recover them in v1.7 ?

Finally, my developer found using query_raw can be much more quick

1 Like

Hello @justin_zhang,
I’m glad that worked for you!
You might also consider using the fieldsAsCol() function in your query or the pivot() function in your query. I found this very useful when using the pandas query method.