Can this query with paging be optimised?

Sakerdotes · October 24, 2022, 11:27am

Query

Hi, I need a query that can be paged. For that purpose i added a sort on time and a total count.
This way i can exactly get a specific amount of entries on page X from total amount of Y.

data = from(bucket: "bucket")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "my_events")
|> drop(columns: ["tag1", "tag2", "tag3"])
|> sort(columns: ["_time"],desc: true)

total =
data
|> count()
|> toString()
|> findColumn(fn: (key) => key._field == "myField",column: "_value")

data
|> set(key: "total", value: total[0])
|> limit(n:50,offset:0)
|> yield()

This query does exactly what i want.

Problem

I am concerned with the performance of this Query. The time this query take gets much longer when more entries exists in the given range.

At 15.000 total entries => 0.4 seconds
At 70.000 total entries => 0.85 seconds

At amount of over 100.000 this query will take so long that the user experience in my application will be negatively impacted.

Often you hear about millions of timeseries entries. There must be something wrong with my query when it will get this slow so fast.

Questions

Can this query be optimised?
Is there another way of paging available?

Anaisdg · October 31, 2022, 3:14pm

Hello @Sakerdotes,
I’m a little confused getting the total doesn’t affect how many values you return with the limit() function. It doesn’t seem like you need to perform the count or contain it in the output of your final yield? Can you help me understand why?

This might optimize it slightly:

data = from(bucket: "bucket")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "my_events")
|> drop(columns: ["tag1", "tag2", "tag3"])
|> sort(columns: ["_time"],desc: true)

total =
data
|> count()
|> findRecord(fn: (key) => true, idx: 0)


data
|> set(key: "total", value: string(v:total._value)
|> limit(n:50,offset:0)
|> yield()

Since I don’t understand the effect that the count has on the limit yield each result. Also I don’t think you need the sort here for this query I believe time order is guaranteed. Also dropping those tags won’t group your data together so you’re most likely getting the count from just one tag. This is also confusing to me. If you just want the count as well I’d yield both results like:

data = from(bucket: "bucket")
|> range(start: -7d)
|> filter(fn: (r) => r["_measurement"] == "my_events")

total =
data
|> count()
|> yield(name: "count for a tag maybe the count is consistent across tags") 


data
|> limit(n:50,offset:0)
|> yield(name: "limit results")

Sakerdotes · November 1, 2022, 1:17pm

Hello @Anaisdg ,
Thanks for the response. It seems I need to explain my query a bit more.

General Idea

I want to display the queried data in my own application frontend, in form of a list of maximal 100 or maybe 200 entries, from my possibly huge pool of entries. This list should have a pagination toolbar that shows the current page and the maximum amount of available pages. The total entry amount is needed to calculate this page information. The only data I need to display is the data chunk in the “_field” from my data.

Specific Points

I drop the tags because they affect the amount the limit() returns. Else the query will give me limit * tag_amount entries back and splits the total count for each tag variation.
I need to further process this data from the “_field”. It is no option for me get all entries at once.
The returned entries are not correctly sorted by time when not explicitly using sort(). Also need sort direction “desc”

Success

I just need the total count once. So I gave the count() it’s own yield like you said and it dropped the query execution time by ~50%.

Question

So now my Question is if I could perform a better way of sorting?

Because the query would again be ~50% faster by removing the sort. So there is potencial for better performance.

Anaisdg · November 4, 2022, 2:14pm

Hello @Sakerdotes,
You might want to try using a group() instead of a drop.
But unfortunately I think you’ll have to use sort then to garuantee order.
However, you could try adding a filter for a _time < your selected range to narrow down on the limit 50. Or add another range function or try narrowing your range function further.

Sakerdotes · November 8, 2022, 10:14am

Hello @Anaisdg,

Thanks again for the Response. The group() does not seem to improve the query speed.
But the query speed seems now to be good enough for my usecase and this post is solved.

Anaisdg · November 9, 2022, 7:39pm

@Sakerdotes Awesome! Happy to hear it.

Topic		Replies	Views
Limit / top ignored in flux queries InfluxDB 2 influxdb	2	559	November 28, 2023
How to apply limit() and sort() on the entire queryresult InfluxDB 2 flux	3	470	June 22, 2021
Query performance issue on 3000 records took 40 sec InfluxDB 2 query , flux	10	1328	August 1, 2022
I'm using Flux to do a search, when the query has more than 1 million rows, the inflow returns me null InfluxDB 2	1	394	January 8, 2021
Garbage performance for getting n amount of last elements from fields InfluxDB 2 query , flux , performance	4	74	August 22, 2025