[URGENT] Flux vs InfluxQL Groupby Speed

Hi Team,

I’m super confused as to why in the below example FLUX is significantly SLOWER than InfluxQL.

Flux Query:

from(bucket: "quantity_management/autosupplier")
|> range(start: -7d)
|> filter(fn: (r) => r._measurement == "aggregation-quantities")
|> filter(fn: (r) => r.product_id == "QWMTV")
|> filter(fn: (r) => r["quantity_key.quantity_type"] == "amount")
|> filter(fn: (r) => r["product_sub_id"] == "KLX")
|> filter(fn: (r) => r._field == "quantity")
|> group(columns: [
"quantity_key.quantity_type",
"quantity_key.shop_id",
"quantity_key.denom_id",
"quantity_key.product_sub_id",
"quantity_key.product_account_id",
"quantity_key.product_auxiliary_account_id"
], mode:"by")
|> aggregateWindow(every: 1m, fn: max, createEmpty: false)

InfluxQL:

SELECT max("quantity")
FROM "quantity_management".."autosupplier"
WHERE ("product_id" = 'QWMTV' AND "product_sub_id" = 'KLX' and "quantity_key.quantity_type" = 'amount') 
  AND time > now() - 7d
GROUP BY
         "quantity_key.quantity_type",
         "quantity_key.shop_id",
         "quantity_key.denom_id",
         "quantity_key.product_sub_id",
         "quantity_key.product_account_id",
         "quantity_key.product_auxiliary_account_id",
         time(1m)
  fill(null)

@Anaisdg , @scott, @grant1 Any ideas please? My suspicion flux is not able to handle such a layered group by, thoughts?

for Flux performance really depends on your schema.

filter by the one that narrows the most first,

I don’t know your database, but chances are that filter by _field first will have less values. Try it and let us know.

Thank you, @fercasjr, the suggestion of filtering on the field first definitely helped. Nevertheless, the speed in which InfluxQL is returning results is very noticeably faster. Any other levers to pull?

@pauldix any suggestions here please?

Hey Team, @Jay_Clifford @Anaisdg , @scott, @grant1 Any ideas here please?

I removed the groupby on Flux and that made the query faster but I find it very strange that InfluxQL is still very noticeably faster, why would this be the case?

Note: My primary source of testing is on Grafana. I’m running Flux and InfluxQL queries on Grafana on for Influx2 and can see the response time for InfluxQL queries is faster.

@ajetsharwin I would say that, in general, aggregate/selector queries perform better with InfluxQL than they do with Flux, but that’s not always true. The only real levers to pull are utilizing as much push-down functionality as you can, which it does appear you are doing (once you remove group()).

Thanks for your message here @scott . A couple of follow ups here please:

For context: I’ve been leading the switch from InfluxQL to Flux in my team and we currently have grafana dashboards that work well in InfluxQL. Our dev team has migrated from Influx 1.8 to Influx 2.7 and we’ve undertaken work to move all queries from InfluxQL to Flux along with all changes on Grafana to cater for Flux.

Follow ups:

  1. My gut expectation was for Flux to have been designed to perform better than InfluxQL across the spectrum. Is there any particular reason why for aggregate/selector queries Flux is not more dominant than InfluxQL? Aggregate and selector queries are a large part of our team’s work and I’m now feeling a sense of uncertainty and discouragement to move from InfluxQL to Flux given the difference in performance for aggregate based queries.

  2. I’m interested to know, what exactly is it about removing the group function that has increased the query performance? Especially given most of the columns in the specified group key are actually tag keys.

do you have any sort of tags on the columns you are filtering, sorting etc on? those are indexed hence speed up performance from what I read.

G’day @scott, any ideas about the above please?

cc: @Anaisdg , @pauldix