Limit / top ignored in flux queries

Hello,

I’m using influxdb [InfluxDB v2.7.4 (git: 19e5c0e1b7)] as storage for my data crawlers. Yesterday I noticed large performance impacts with some jobs, and dug deeper.

I minified the query to the following POC:

from(bucket:"statistics")
 |> range(start: -1y)
 |> filter(fn: (r) => r["_measurement"] == "app_PlaytimeForever")
 |> top(n: 250, columns: ["_value"])
 |> limit(n: 250)

=> get me the top 250 measurements of app_PlaytimeForever, and just the top 250 - by my understanding limit is not even required, as:

top() sorts each input table by specified columns and keeps the top n records in each table
https://docs.influxdata.com/flux/v0/stdlib/universe/top/
limit() returns the first n rows after the specified offset from each input table.
https://docs.influxdata.com/flux/v0/stdlib/universe/limit/

Nevertheless, no matter if I use limit, top, both or even remove the filter, I get all ~400k results, in the data explorer as well as when executing the query manually.

As searching for that topic did not bring me any further, I’m looking for your suggestion, what part of the flux language I did not understood properly.

Kind regards,
Fabian

@Fabian_Schneider By default, from() |> range() |> filter() returns data grouped by _measurement, _field, and each tag. So each unique combination of measurements, fields, and tags is represented by a group/table in your results. top() and limit() operate on each input table/group.

What you can do is ungroup all your tables into a single table before you apply top():

from(bucket:"statistics")
    |> range(start: -1y)
    |> filter(fn: (r) => r["_measurement"] == "app_PlaytimeForever")
    |> group()
    |> top(n: 250, columns: ["_value"])
1 Like

Thank you, adding a group in the end indeed solved the issue.