I’ve tried to find a solution of the join() performance issue, see
and tried
import "experimental"
offmst = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] == "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value"])
offend = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] != "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value", "host", "oid"])
  |> group()
  |> sort(columns: ["_time"])
experimental.join(left:offmst, right:offend,
   fn: (left, right) => ({
     left with
       host: right.host,
       oid: right.oid,
       _value: right._value - left._value
     }))
  |> group(columns: ["host", "oid"], mode:"by")
  |> sort(columns: ["_time"])
  |> yield()
which should be equivalent to the conventional
offmst = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] == "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value"])
  |> rename(columns: {_value: "off_mst"})
offend = from(bucket: "dca")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "TfcMonitor" and
                       r["host"] != "cbmin00y" and
                       r["_field"] == "offabs")
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: true)
  |> keep(columns: ["_time","_value", "host", "oid"])
  |> group()
  |> sort(columns: ["_time"])
  |> rename(columns: {_value: "off_end"})
join(tables: {mst:offmst, end:offend}, on: ["_time"])
  |> map(fn: (r) => ({ r with _value: r.off_end - r.off_mst}))
  |> drop(columns: ["off_mst","off_end"])
  |> group(columns: ["host", "oid"], mode:"by")
  |> sort(columns: ["_time"])
  |> yield()
For modest window sizes experimental.joint() seems to give the correct result, and shows good preformance
Wind #rec  cpu_tot    mem_tot  cpu_join 
 10m 2017  0.423      983808   0.253
 20m 1009  0.244      475904   0.122
 60m  337  0.159      182848   0.065
So CPU time and memory consumption is now linear with row count, and not longer quadratic.
But for larger window size I get a panic: unknown type invalid.
The full message from syslog is attached.
Is that a known bug ?
Have others run into that ?
Or should I file a github issue on this ?
exp_join_panic.txt (4.4 KB)




