Query performance issue on 3000 records took 40 sec

Hi Team,

I have a flux query which takes too much time to execute.
The query is:

from(bucket: "DATA_DB") 
|>range( start:0, stop: 2023-07-19T00:00:00Z ) 
|>filter(fn: (r) => r["_measurement"] == "1057_Cust") 
|>last(column: "_value") 
|>group(columns: ["_field","setId"]) 
|>drop(columns: ["_measurement","_start","_stop","fkey"]) 

Execution Info :

  • Total Data Count - 3000
  • Fields - 6 and each fields has 1 tag
  • Common Tag - 1 (“setId”)
  • I takes 40 sec. to execute

Note : I tried one filter condition which is -

|>filter(fn: (r) => r["_field"] == "mail_type")

Final Query :

from(bucket: "DATA_DB") 
|>range( start:0, stop: 2023-07-19T00:00:00Z ) 
|>filter(fn: (r) => r["_measurement"] == "1057_Cust") 
|>filter(fn: (r) => r["_field"] == "mail_type")
|>last(column: "_value") 
|>group(columns: ["_field","setId"]) 
|>drop(columns: ["_measurement","_start","_stop","fkey"]) 

It took 2.3 Sec. to execute

I am using - Influxdb2.2 OSS

Please suggest where is the problem.

Thank you.

Hi @debnath -

It seems that without

the last |> group |> drop statements are taking much longer (let’s say at least 10X longer - see analysis note below). So the add’l mail_type filter is reducing the data set by about 10X.

Analysis Note:
The from |> range |> filter ("1057_Cust") take processing time, of course, but these are not driving the poor performance (40 sec) of the first query. If these statements took zero time to execute, then a straight ratio (40/2.3)=17.4 would suggest about 17X more data without filtering on “mail_type”

If the question is “why is either query taking seconds vs say, ms” then knowing the hardware config and processor loading while executing these scripts would be helpful. I suppose there might be an InFluxDB setting that needs tuning, but OSS usually works fine right “out-of-the-box”.

So to summarize, if the two queries were run under the same processor loading conditions then the 40 sec case can be explained by having to process more data by last |> group |> drop. But if execution time of seconds for either query is the core question, then more knowledge of the hardware, the processor loading, and the Data_DB data set schema would help to diagnose that issue.

Hope this addresses your question.

Hi @phill ,

I tried -

from(bucket: "DATA_DB") 
|>range( start:0, stop: 2023-07-19T00:00:00Z ) 
|>filter(fn: (r) => r["_measurement"] == "1057_Cust")

But still it took 40 sec.. Its only 3000 records

If you need any log or info we can provide.

Thank you.

@debnath To be clear, you are saying that executing the 3 line script above takes 40 secs, but adding the additional lines

|>filter(fn: (r) => r["_field"] == "mail_type")
|>last(column: "_value") 
|>group(columns: ["_field","setId"]) 
|>drop(columns: ["_measurement","_start","_stop","fkey"]) 

to that script reduces the execution time to 2.3 sec?

Hi @phill ,

Adding only the filter condition making it to 2.3 sec.

|>filter(fn: (r) => r["_field"] == "mail_type")

@debnath Would you run each case and include the logs for each. Are the results repeatable - same execution time no matter how many times it’s run? Thanks.

Hi @debnath , where you run your queries from? maybe the data transfer from Influx server to your machine is slow…

Hi @phill ,

What kind of log you are talking about. Is there anything to enable for performance log or debug log or something?
where can we get it?
The execution result time is almost same.

Hi @Domenico_Briganti ,

The query is running in my server machine influx UI query dashboard.
Version - Influx2.2 OSS
Here -

@debnath

The log that shows 40 sec and 2.3 sec.

Hi @phill ,

You mean data as log?

Because i am querying from influxdb ui querybuilder.