Panic: arrow/array: index out of range

d60 · June 30, 2023, 1:03pm

Hello!

I have the following error message:

panic: arrow/array: index out of range

When I am trying to join data in large dataset.

I have 2 measurements from the same bucket, first one is called: cycles, and the second called series. For every point in cycles, I have a set of points in series, so that I can provide different level of resolution of data, as well as I can identify each series from cycle using join function. Here is my flux query:

cycles = from(bucket: vBucket)
  |> range(start: vStart, stop: vStop)
  |> filter(fn: (r) => r["_measurement"] == "cycle")
  // the time as string links one cycle for muliple series points
  |> map(fn: (r) => ({ r with cycle: string(v: r._time) }))  series
  |> group()
  |> yield(name: "cycles")

series = from(bucket: vBucket)
  |> range(start: vStart, stop: vStop)
  |> filter(fn: (r) => r["_measurement"] == "series")
  |> group()
  |> yield(name: "series")

  join.inner(
     left: cycles,
     right: series,
     on: (l, r) => l.cycle == r.cycle,
     as: (l, r) => ({r with cycle_result: l._value}))
  |> yield(name: "join")

From cycles and series before calling group(), I drop all the unnecessary columns. My problem is the following. When the timeRange is large enough: especially when series has 36k+ rows, the join function fails with the error message above. With less points, the join performs well.

I have experienced this error multipe time in the past with similar scenarios when I am using join function to build up 1:N relations.

Using group() without columns, I can achive equals groupKeys.

In the cycles, I have a field called ‘result’ that I want to add all cycle-related series so that I can filter out points from series based on value originated from cycles.

In the schema design phase, my intention was to store the result field in the cycles because it is needless to add the same value for all series points to one specific cycle. My plan was to use join function to filter the points.

When I made some tests, I have realized that using join() is very time consuming compared to filter() function.

My questions are the following:

Why does the join function fails for large set of data?
If I can make above join function to work, for larger amount of data, the query time will be significantly higher that may also leads for poor performance.
What is the recommended schema desing practice to handle 1:N relation? Join vs. redundant tagging? Should I forget my concers about storage-requirements just to gain permormant queries?

Thank you very much for your reply!

d60 · July 1, 2023, 8:47pm

@scott? Any idea? Maybe you can help

d60 · July 6, 2023, 9:03am

Guys, can I have some support please? If the join fails for some reason, I need to modify the schema for have tags for filtering, instead of joining data.

Is this a bug or am I doing something wrong? Or should I ignore this limitation because of the new 3.0 version?

Thank you for your time!

Topic		Replies	Views
"internal error: panic: arrow/array: index out of range" InfluxDB 2	8	1880	August 8, 2023
Map function triggering panic index out of range [1] with length 1 InfluxDB 2 flux	5	1832	December 30, 2020
Join function query generates error in Flux Fluxlang influxdb , grafana	4	2976	July 18, 2018
Flux join resulting in no data Fluxlang grafana , flux , join	0	418	November 21, 2022
Request timed out performing join function InfluxDB 2	1	422	October 19, 2021

Panic: arrow/array: index out of range

Related topics