Hello @nZero,
Part of the power of the join package is the ability to perform multiple different joins on multiple columns. Equally as important is the ability to specify your own custom as function and the ability to make that as complex as you want. You can even anticipate changes in the series you’re trying to join together to accommodate for instances where you have missing columns. For example, imagine we want to regularly join data as a part of a task. However the presence of one column from one table is sparse. We can use a combination of null value injection and conditional logic to still perform a join. This solution is to circumnavigate an error like runtime error @31:12-39:3: left: table is missing label <column name>
. Let’s take a look at the following example:
table1 has three columns where columns foo and bar have some null values:
Notice how table2 has an additional column yeet which table1 is missing:
on |
foo |
bar |
yeet |
1 |
0 |
4 |
42 |
2 |
2 |
5 |
42 |
Imagine we want to perform a left join. We need to use conditional logic as a part of the function parameter to specify what we want to return in the instances where we have null values in the foo and bar column of table1. We also need to use the debug.null() function in a map() function to create an empty column with null values so that the join doesn’t fail.
on |
foo |
bar |
yeet |
1 |
1 |
4 |
42 |
2 |
2 |
3 |
42 |
Try this out for yourself with the following flux code:
import “experimental/array”
import “array”
import “join”
import “internal/debug”
table1 = array.from(rows: [{pivot: 1, baz: “foo”}, {pivot: 3, baz: “bar”}])
|> pivot(rowKey:[“pivot”], columnKey: [“baz”], valueColumn: “pivot”)
|> map(fn: (r) => ({ r with on: 1 }))
|> cumulativeSum(columns: [“on”])
// ignore the code before this line, this is just to create a good example.
// use the debug.null function to create a new column with null values so the join doesn’t fail.
|> map(fn: (r) => ({ r with yeet: debug.null(type: “string”) }))
|> yield(name: “table1”)
table2 = array.from(rows: [{on: 1, foo: 0, bar: 4, yeet: “42”}, {on: 2, foo: 2, bar: 5, yeet: “42”}])
|> yield(name: “table2”)
joined = join.left(left: table1, right: table2, on: (l, r) => l.on == r.on, as: (l, r) => {
return {
foo: if exists l.foo then int(v: l.foo) else int(v: r.bar),
yeet: if exists l.yeet then int(v: l.yeet) else int(v: r.yeet),
bar: if exists l.bar then int(v: l.bar) else int(v: r.bar),
}
})
joined |> yield(name: “joined”)