Very poor performance when execute query with pivot

aaron_yang · February 16, 2022, 9:56am

we have 2.8G records (each records contains 7 fields). When we try to execute query and using pivot to compose records, we encountered very poor performance. The query is as below:

from(bucket: "zillonare")
  |> range(start: 2022-02-15T01:31:00Z, stop: 2022-02-15T07:00:01Z)
  |> filter(fn: (r) => r["_measurement"] == "stock_min1")
  |> pivot(columnKey: ["_field"], rowKey: ["_time"], valueColumn: "_value")
  |> keep(columns: ["frame","open","high","low","close","volume","amount","factor","code","_time"])
  |> sort(columns: ["_time"], desc: false)

this will cost us almost 10 minutes, without pivot, it can return in almost 10 seconds.

Wonder is this normal? Or the query clause can be polished?

scott · February 18, 2022, 9:34pm

@aaron_yang I think there are a few things you can do to optimize this query.

What’s the resolution of the data? How many points per minute for example? If it’s at a 1s resolution, this could be querying a lot of data depending on the fields in the stock_min1 measurement. If you don’t need the high resolution data, you can downsample to provide per minute or per hour summaries and reduce the amount of data being returned.
Right now, you’re pivoting EVERYTHING in the stock_min1 measurement, then using keep to essentially do two things 1) Keep the time column 2) drop all the field columns you don’t want. It would be more efficient to actually filter on your fields rather that querying everything, pivoting, then dropping the fields you don’t want.
If you don’t use keep(), you also shouldn’t need to resort the data. This will also save some time.

So something like this:

from(bucket: "zillonare")
  |> range(start: 2022-02-15T01:31:00Z, stop: 2022-02-15T07:00:01Z)
  |> filter(fn: (r) => r["_measurement"] == "stock_min1")
  |> filter(fn: (r) => r["_field"] == "frame" or r["_field"] == "open" or r["_field"] == "high" or r["_field"] == "low" or r["_field"] == "close" or r["_field"] == "volume" or r["_field"] == "amount")
  |> aggregateWindow(every: 1h, fn: last)
  |> pivot(columnKey: ["_field"], rowKey: ["_time"], valueColumn: "_value")

aaron_yang · February 21, 2022, 2:38am

the resolution is 1m for this measurement, however we set the precision as ‘1s’ when save data points. Is there any way to change the resolution now?

about “sort”, I don’t find document that states how the query result will be returned. Will it always be returned in ascending order by “_time”?

“keep” is used because we don’t want “_start” and “_stop” to be returned. No matter which query we used, if we don’t use “keep”, the two fields will always be returned.

scott · February 22, 2022, 4:21pm

The timestamp precision doesn’t really matter. Flux treats everything as nanosecond precision. How often you’re writing and exactly how many points you’re querying is what matters here.

This could definitely be clearer in the documentation and, currently, it’s a little all over the place. When querying data from InfluxDB, it returns data sorted by time in ascending order by default. I don’t know if this is always true, but functions that change the group key of input data no longer guarantee sort order and output data may need to be re-sorted.

I see. In this case, drop() would be less verbose than keep():

from(bucket: "zillonare")
  |> range(start: 2022-02-15T01:31:00Z, stop: 2022-02-15T07:00:01Z)
  |> filter(fn: (r) => r["_measurement"] == "stock_min1")
  |> aggregateWindow(every: 1h, fn: last)
  |> pivot(columnKey: ["_field"], rowKey: ["_time"], valueColumn: "_value")
  |> drop(columns: ["_start", "_stop"]

aaron_yang · February 23, 2022, 3:09pm

very helpful and many thanks!

by the way, when we write data to influxdb using v2.x API, the precision it supported only contains “s”, “us”, “ms” and “ns”. So don’t bother with “h” and “m”, it only exist in 1.x api I guess?

Topic		Replies	Views
Using Pivot in Flux on InfluxDB data - what am I doing wrong? Fluxlang influxdb , flux	6	1734	November 20, 2022
Query needs optimization but I don't know how flux	3	149	February 7, 2024
How do I optimise this query? Fluxlang influxdb , grafana	3	608	September 30, 2021
Optimize query with pivot Fluxlang	1	985	July 31, 2020
Query performance issue on 3000 records took 40 sec InfluxDB 2 query , flux	10	1236	August 1, 2022

Very poor performance when execute query with pivot

Related topics