Intersection / Inner join on stream of tables

OM222O · January 13, 2025, 5:10pm

Hi,
I’m using open telemetry to store metrics for my code. I want to filter out metrics for specific code paths taken rather than an aggregated view of all paths. I’m adding a hash based on the code path taken; Imagine this is the set of possible paths through my code

And for these paths I generate the following hashes:

A->C->E : 1
A->C->D : 2
B->C->D : 3
B->C->E : 4

Now I want to filter these paths, for example selecting “A” (in Grafana) should return paths 1 and 2 (A->C->E and A->C->D) but selecting “A”, “C” and “D” at the same time should only return path 2 (A->C->D).

I have managed to create tabular results that are grouped by nodes present in each path, but I can’t find any way to intersect / inner join a stream of multiple tables in order to get the correct paths.
Here is the influx query to re-produce this example:

import "array"

array.from(rows:[
  {node: "A", path: 1},
  {node: "C", path: 1},
  {node: "E", path: 1},
  {node: "A", path: 2},
  {node: "C", path: 2},
  {node: "D", path: 2},
  {node: "B", path: 3},
  {node: "C", path: 3},
  {node: "D", path: 3},
  {node: "B", path: 4},
  {node: "C", path: 4},
  {node: "E", path: 4}
])
  |> filter(fn: (r) => r.node == "A" or r.node == "C" or r.node == "D")
  |> group(columns: ["path"])

The problem is that because I’m including “C” which is common in all paths, I need to intersect / inner join paths in all of the groups / tables / stream. I know in this specific simplified example leaving out “C” solves the issue but this is just a simplified example, not the actual use case.

Can you please help me get the correct output (path 2) from the reproducible query above?

Anaisdg · January 13, 2025, 7:16pm

@OM222O,
First may I just say THANK YOU for including so much detail and example data? You absolute legend of a community user. Looking into this now.

OM222O · January 14, 2025, 1:12pm

So I managed to use a workaround to achieve this:

import "array"

array.from(rows:[
  {node: "A", path: 1},
  {node: "C", path: 1},
  {node: "E", path: 1},
  {node: "A", path: 2},
  {node: "C", path: 2},
  {node: "D", path: 2},
  {node: "B", path: 3},
  {node: "C", path: 3},
  {node: "D", path: 3},
  {node: "B", path: 4},
  {node: "C", path: 4},
  {node: "E", path: 4}
])
  |> filter(fn: (r) => r.node == "A" or r.node == "C" or r.node == "D")
  |> map(fn: (r) => ({r with _key:  r.node + string(v:r.path)}))
  |> unique(column: "_key")
  |> group(columns: ["path"])
  |> count(column: "node")
  |> group()
// 3 is the number of selected nodes and comes from grafana variable 
// (in this case I have selected A, B and C which is a count of 3)
  |> filter(fn: (r) => (r.node == 3))
  |> keep(columns: ["path"])

However I have to use a weird hack by constructing a stringly (not strongly) typed key since unique / distinct can’t operate on multiple columns. I then union all the unique combinations of “node + path” and get the intersection by checking the actual counts in the unioned table vs the selected count in Grafana to do the intersection. I’m a bit surprised that basic functionality like multi-column unique or intersection are not supported out of the box and given that flux is on maintenance, it’s unlikely they’ll ever be supported

As a side note: the deprecated join function (join() function | Flux Documentation) is very close to doing this by default but I can’t pipe the results from the previous step or somehow unpack them for this to work.

Topic		Replies	Views
Flux - Create a table with a measurement while another measurement have a specific value Fluxlang flux	3	655	August 17, 2021
Flux: Simple way to query multiple measurements & return ONE table Fluxlang	2	4587	October 21, 2021
Query for grafana filtered by other query in other table InfluxQL	7	2416	February 9, 2022
Drawing the result of Multiple Filters in Chronograf Fluxlang chronograf	4	545	March 13, 2020
Flux help required Fluxlang influxdb , query , flux	2	37	May 4, 2025

Intersection / Inner join on stream of tables

Related topics