Tried experimental.join(): good performance, but also "panic: unknown type invalid"

wfjm · January 6, 2022, 12:38pm

I’ve tried to find a solution of the join() performance issue, see

and tried

import "experimental"
offmst = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] == "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value"])
offend = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] != "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value", "host", "oid"])
  |> group()
  |> sort(columns: ["_time"])
experimental.join(left:offmst, right:offend,
   fn: (left, right) => ({
     left with
       host: right.host,
       oid: right.oid,
       _value: right._value - left._value
     }))
  |> group(columns: ["host", "oid"], mode:"by")
  |> sort(columns: ["_time"])
  |> yield()

which should be equivalent to the conventional

offmst = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] == "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value"])
  |> rename(columns: {_value: "off_mst"})
offend = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] != "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value", "host", "oid"])
  |> group()
  |> sort(columns: ["_time"])
  |> rename(columns: {_value: "off_end"})
join(tables: {mst:offmst, end:offend}, on: ["_time"])
  |> map(fn: (r) => ({ r with _value: r.off_end - r.off_mst}))
  |> drop(columns: ["off_mst","off_end"])
  |> group(columns: ["host", "oid"], mode:"by")
  |> sort(columns: ["_time"])
  |> yield()

For modest window sizes experimental.joint() seems to give the correct result, and shows good preformance

Wind #rec  cpu_tot    mem_tot  cpu_join 
 10m 2017  0.423      983808   0.253
 20m 1009  0.244      475904   0.122
 60m  337  0.159      182848   0.065

So CPU time and memory consumption is now linear with row count, and not longer quadratic.

But for larger window size I get a panic: unknown type invalid.
The full message from syslog is attached.

Is that a known bug ?
Have others run into that ?
Or should I file a github issue on this ?
exp_join_panic.txt (4.4 KB)

Topic		Replies	Views
Poor performance for join(): cpu and memory grows quadratically with row count Fluxlang influxdb , query , flux , join	6	1672	June 30, 2023
Flux join() performance InfluxDB 2 influxdb , query , flux	8	872	May 9, 2023
Join Tables issue Fluxlang	2	549	June 2, 2021
Join function query generates error in Flux Fluxlang influxdb , grafana	4	2974	July 18, 2018
My code will break when the join() function is replaced by experimental.join() Kapacitor influxdb , flux , influxdb-cloud-2-0	1	662	September 29, 2020

Tried experimental.join(): good performance, but also "panic: unknown type invalid"

Related topics