Using join.left() as lookup table

Hi,

I’m trying to use join to add a description to error codes.
My first (test) table is like this:

result,table,_start,_stop,_time,_value,_field,_measurement,dtc
,,0,2022-11-20T10:14:47.722543388Z,2022-12-20T10:14:47.722543388Z,2022-12-20T08:41:45Z,23,LHFC_DTC1,TEST01,23
,,0,2022-11-20T10:14:47.722543388Z,2022-12-20T10:14:47.722543388Z,2022-12-20T08:41:46Z,21,LHFC_DTC1,TEST01,21
,,0,2022-11-20T10:14:47.722543388Z,2022-12-20T10:14:47.722543388Z,2022-12-20T08:41:47Z,20,LHFC_DTC1,TEST01,20
,,0,2022-11-20T10:14:47.722543388Z,2022-12-20T10:14:47.722543388Z,2022-12-20T08:41:48Z,14,LHFC_DTC1,TEST01,14
,,0,2022-11-20T10:14:47.722543388Z,2022-12-20T10:14:47.722543388Z,2022-12-20T08:41:49Z,13,LHFC_DTC1,TEST01,13
,,0,2022-11-20T10:14:47.722543388Z,2022-12-20T10:14:47.722543388Z,2022-12-20T08:41:50Z,11,LHFC_DTC1,TEST01,11
,,0,2022-11-20T10:14:47.722543388Z,2022-12-20T10:14:47.722543388Z,2022-12-20T08:41:51Z,17,LHFC_DTC1,TEST01,17

and to these dtc values I want to add a description. From the next generated table:

,result,table,_start,_stop,_time,_value,_field,_measurement,dtc
,,0,2022-11-20T07:49:03.131Z,2022-12-20T07:49:03.131Z,2022-12-19T15:34:46Z,No DTC,descr,FC_DTC,0,No DTC,0
,,0,2022-11-20T07:49:03.131Z,2022-12-20T07:49:03.131Z,2022-12-19T15:34:46Z,Air Flow Below Setp Run ,descr,FC_DTC,100,
,,0,2022-11-20T07:49:03.131Z,2022-12-20T07:49:03.131Z,2022-12-19T15:34:46Z,Air Flow Below Setp Run,descr,FC_DTC,101,

So I basically want to match the read values from the first query to the dtc values from the second table.

I’m trying to do that using the next flux query:

import "join"

orgDtc = from(bucket: "FAKETESTDATA")
    |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
    |> filter(fn: (r) => r["_measurement"] == "TEST01")
    |> filter(fn: (r) => r["_field"] == "LHFC_DTC1")
    |> filter(fn: (r) => exists r._value)
    |> map(fn: (r) => ({r with dtc: float(v: r._value)}))


dtcDesc = from(bucket: "FC_DTC_Bucket")
	|> range(start: 2022-11-20T07:49:03.131Z, stop: 2022-12-20T07:49:03.131Z)
    |> filter(fn: (r) => r["_measurement"] == "FC_DTC")
    |> map(fn: (r) => ({r with fdtc: float(v: r.dtc) , descr: r._value}))
    |> group()


join.left(left: orgDtc,
            right: dtcDesc,
            on: (l, r) => l.dtc == r.fdtc,
            as: (l, r) => ({l with descr: r.descr, righ: r.fdtc})
            )

Now perhaps its a bit chaotic, I use map to convert the two values that need to be joined as floats, to make sure they are, but actually in the future I might want to use strings for that.
In the result I see the new columns, but they are empty. My guess is that the on function doesn’t find the needed value.

How can I debug the problem?
What did I do wrong?

Thanks in advance for the help,
David

Hello @DavidHy,
The first thing I noticed is that your csv seems to be off. there’s a number of fields mismatch.
I simplified the csv and it worked. It looks like you were calling the column names a little off. I also changed the values of the dtcDesc so that you’d actually have values to join on.

import "csv"
import "join"

csvData1 = "_time,_value,_field,_measurement,dtc
2022-12-20T08:41:45Z,23,LHFC_DTC1,TEST01,23
2022-12-20T08:41:46Z,21,LHFC_DTC1,TEST01,21
2022-12-20T08:41:47Z,20,LHFC_DTC1,TEST01,20
2022-12-20T08:41:48Z,14,LHFC_DTC1,TEST01,14
2022-12-20T08:41:49Z,13,LHFC_DTC1,TEST01,13
2022-12-20T08:41:50Z,11,LHFC_DTC1,TEST01,11
2022-12-20T08:41:51Z,17,LHFC_DTC1,TEST01,17"

csvData2 = "_time,_value,_field,_measurement,dtc
2022-12-19T15:34:46Z,No DTC,descr,FC_DTC,0
2022-12-19T15:34:46Z,Air Flow Below Setp Run ,descr,FC_DTC,12
2022-12-19T15:34:46Z,Air Flow Below Setp Run,descr,FC_DTC,17"
orgDtc = csv.from(csv: csvData1, mode: "raw")
// |> yield(name: "orgDtc")
dtcDesc = csv.from(csv: csvData2, mode: "raw")
// |> yield(name: "dtcDesc")

join.left(left: orgDtc,
            right: dtcDesc,
            on: (l, r) => l.dtc == r.dtc,
            as: (l, r) => ({l with descr: r._measurement, righ: r.dtc})
            )

Hi Anaisdg,

Thanks for the reply.
I had more time to play with the set up. Indeed the csv’s work as intended and even a combination of csv + one query works well.
The problem occurred only when both of the tables come from queries.

What solved the issue for me, is removing “_measurement” and “_field” from the table, leaving basically only _time, _value and the custom created columns.
I don’t know why this is the behavior or if its a bug or I missed something in the documentation, but at least there is a solution for me.

Thank you for your help,
I hope that perhaps others who might experience this issue will find the solution here.

David

1 Like

@DavidHy,
Agreed. Thank you for asking! I’m sure it will help someone else down the line.