Hello,
I’m using influxdb [InfluxDB v2.7.4 (git: 19e5c0e1b7)] as storage for my data crawlers. Yesterday I noticed large performance impacts with some jobs, and dug deeper.
I minified the query to the following POC:
from(bucket:"statistics")
|> range(start: -1y)
|> filter(fn: (r) => r["_measurement"] == "app_PlaytimeForever")
|> top(n: 250, columns: ["_value"])
|> limit(n: 250)
=> get me the top 250 measurements of app_PlaytimeForever, and just the top 250 - by my understanding limit is not even required, as:
top()
sorts each input table by specified columns and keeps the topn
records in each table
https://docs.influxdata.com/flux/v0/stdlib/universe/top/
limit()
returns the firstn
rows after the specifiedoffset
from each input table.
https://docs.influxdata.com/flux/v0/stdlib/universe/limit/
Nevertheless, no matter if I use limit, top, both or even remove the filter, I get all ~400k results, in the data explorer as well as when executing the query manually.
As searching for that topic did not bring me any further, I’m looking for your suggestion, what part of the flux language I did not understood properly.
Kind regards,
Fabian