Left outer join produces missing label message

Hi everyone,

I have a problem performing an left outer join, where i get an error message which i don’t really understand based on the documentation on the new join package and the following blog i went through Outer Joins in Flux | InfluxData.

The query looks like following:

table1 = from(bucket: "metrics")
  |> range(start: -21m)
  |> filter(fn: (r) => r._measurement == "device" and r.measurement_type == "metrics" and (r._field == "bri" or r._field == "cmd_val" or r._field == "snaga"))
  |> last()
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> group(columns: ["device_address"])
  |> sort(columns: ["_time"], desc: true)
  |> limit(n: 1)

table2 = from(bucket: "metrics")
  |> range(start: -21m)
  |> filter(fn: (r) => r._measurement == "device" and r.measurement_type == "brightness" and r._field == "cmd_val")
  |> last()
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> group(columns: ["device_address"])
  |> sort(columns: ["_time"], desc: true)
  |> limit(n: 1)

combined = join.left(left: table1, right: table2, on: (l, r) => l.device_address == r.device_address, as: (l, r) => {
  return {
    device_address: l.device_address,
    bri: if exists r.cmd_val then int(v: r.cmd_val) else l.bri,
    snaga: l.snaga,
    cmd_tag: if exists r.cmd_tag then r.cmd_tag else l.cmd_tag,
    cmd_val: if exists l.cmd_val then l.cmd_val else r.cmd_val
  }
})
|> group(columns: ["result"])
|> yield(name: "combined")

The error i get is:

runtime error @31:12-39:3: left: table is missing label cmd_val

The ,cmd_val, in the first table might not appear if no errors are present on any device. If all devices have errors then ,bri, and ,snaga, would not appear and only ,cmd_val, will be present. I could add map to the table1 and add any column with some default value if the column does not exist and then all 3 columns would always exist, but i don’t know if that is necessary, as i don’t understand the error.

I get also the similar message on different columns if the right table (table2) is totally empty. I saw an opened issue join.tables() method: "full" with an empty table returns "table is missing label _value" · Issue #5134 · influxdata/flux · GitHub which addresses something similar, so i assume that this case is a bug.

Hello @greenenvy,
Can you please share the annotated csv for table1 and table2 with me?
And what you want your expected output to look like?

I’ve only seen that conditional syntax used with full joins.

@Anaisdg I sent you the data on PM if thats ok.

The output i want to get should look really similar to table 1. The ,bri, field of the combined table depends if for device_address an entry in table 2 exists. If it does not exist, take ,bri, from the left table, or else take ,cmd_val, from the right table.

The example i sent you, for table 1 does not have any value for cmd_val, and that’s why the error occurs which i don’t understand. If cmd_val exists in the left table (which is rare), it has a totally different meaning (it means that the device has an internal error). The query i have shown only runs if there is at least one device in the error state.

The blog post about the new join package states that we should take care about grouping when doing outer joins, and both table1 and table2 are grouped by the same key. Only if the pivot actually makes the problem but i’m not totally sure.

Hello @greenenvy,
Yes that’s fine. Let me take a look now. I noticed the data you gave me doesn’t reflect your query. Mainly that there is more than one record in each table in the table stream “table1” and “table2”. I’m getting 3 records for table 2 and many more for table 1. Can you please share your data with the limit applied so I can try to replicate your problem exactly? Thank you!

Hello @greenenvy,
I don’t get any errors with the data you shared with me however:

import "csv"
import "join"

two = csv.from(csv: 
"#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,string
#default,table2,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,device_address,groups,measurement_type,msg_type,src_addr,src_dev_type,cmd_val
xxx
")

two |> yield(name: "two")
one = csv.from(csv: 
"#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,device_address,groups,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,device_address,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,device_address,groups,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,device_address,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,device_address,groups,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx
")

one |> limit(n:1) |> yield(name: "one")

join.left(left: one, right: two, on: (l, r) => l.device_address == r.device_address, as: (l, r) => {
   bri = if exists r.cmd_val then int(v: r.cmd_val) else l.bri
   cmdTag =  if exists r.cmd_tag then r.cmd_tag else l.cmd_tag
  return {device_address: l.device_address, cmdTag: cmdTag, bri: bri, snaga: l.snaga}})

@Anaisdg Thank you for the effort really.

Sorry if i wasn’t the clearest maybe. The table2 usually is not empty, which is what i have sent you. I only added in the question at the beginning that the error also occurs when table2 is totally empty, which was also mentioned in the opened issue. There it is a full join, but it occurs with left join either. I’m seeing that there is a pull request which is returning an different error message, but there is a comment that addresses the fact that the join maybe should not return an error if any table is empty. But ok, the empty table thing is a different problem from the first one i’m experiencing.

I sent you on PM the full unmodified data and the exact query without any changes with the screenshot of the error and version details of InfluxDB and Flux.

Hello @greenenvy,

Okay. Thank you for sharing.

Ahhh I’m silly. If exists checks whether or not a value for a column exists. You have to have that column first. You can add null values like so:

import "csv"
import "join"
import "internal/debug"

table1 = csv.from(csv: "#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,dev_addr_tag,groups,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,dev_addr_tag,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,dev_addr_tag,groups,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,dev_addr_tag,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx

#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,long,double
#default,table1,,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,dev_addr_tag,groups,measurement_type,msg_type,src_addr,src_dev_type,bri,snaga
xxx
")

t1 = table1  |> map(fn: (r) => ({ r with  cmd_val: debug.null(type: "string") }))


table2 = csv.from(csv: "#group,false,false,false,false,false,false,false,false,true,false,false,false,false,false,false
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,string
#default,table2,,,,,,,,,,,,,,
,result,table,_start,_stop,_time,_measurement,cmd_tag,company,dev_addr_tag,groups,measurement_type,msg_type,src_addr,src_dev_type,cmd_val
xxx")

// table1 |> yield(name: "table1")
// table2 |> yield(name: "table2")
// 
combined = join.left(left: t1, right: table2, on: (l, r) => l.dev_addr_tag == r.dev_addr_tag, as: (l, r) => {
  return {
    dev_addr_tag: l.dev_addr_tag,
    bri: if exists r.cmd_val then int(v: r.cmd_val) else l.bri,
    snaga: l.snaga,
    cmd_tag: if exists r.cmd_tag then r.cmd_tag else l.cmd_tag,
    cmd_val: if exists l.cmd_val then l.cmd_val else r.cmd_val
  }
})


combined |> yield(name: "combined")

Hi, thank you for the response once again.

The fact is, even when you remove the line where i’m adding the cmd_val in results, i get the same error.

If you execute:

combined = join.left(left: t1, right: table2, on: (l, r) => l.dev_addr_tag == r.dev_addr_tag, as: (l, r) => {
  return {
    dev_addr_tag: l.dev_addr_tag,
    bri: if exists r.cmd_val then int(v: r.cmd_val) else l.bri,
    snaga: l.snaga,
    cmd_tag: if exists r.cmd_tag then r.cmd_tag else l.cmd_tag
  }
})

combined |> yield(name: "combined")

the error is the same. And i don’t get why. I the rest i’m using only r.cmd_val.

@Anaisdg

Actually you are right, i have to have the column in the table to actually perform exists. That was the point. The fact i was getting error again was that when i executed the query, the other one was empty as no input was in the last 21 minutes. So that i’m understanding. You are really a gem to the community :smiley: Thanks very much.

The only remaining thing is the problem when i have an empty table. Then the same error arises, from the reason you described. Any advice how to solve that specific case?

1 Like

@greenenvy
Ahh yes I believe the empty table situation is related to the issue that you posted.
What is the context that you are creating this join?
Is it in a task?
You could use some conditional logic.
I’d take advantage of the if exists clause to determine whether a table is returning data or not.
If it is, then use that table to perform the join else use a dummy table that you construct with array.from().

Does that make sense? Or should I provide an example?

You are a gem to the community for asking good questions! Thank you!

Hi,

I have the same problem with a left join. If the left or right table is empty, I get an error table is missing label {labelName}. Is there any option to check if the table is empty or not? I have found a solution on the Internet and created this workaround, but it is a pain to check if the table exists or not. I think the join function should deal with these empty tables.

isEmpty = (tables) => {
	columnsArray = tables
		|> columns()
		|> findColumn(fn: (key) => true, column: "_value")
	return length(arr: columnsArray) == 0
}

locations = from(bucket: {bucket})... //omitted for brevity
pallet = from(bucket: {bucket})...  //omitted for brevity

isLeftEmpty = isEmpty(tables: locations)
isRightEmpty = isEmpty(tables: pallet)

leftDummy = array.from(rows: [{SensorId: "-1", Moving: 0}])
rightDummy = array.from(rows: [{SensorId: "-1", Pallet: 0}])

join.tables(
	method: "left",
	left:  if isLeftEmpty then leftDummy else locations,
	right: if isRightEmpty then rightDummy else pallet,
	on: (l, r) => l.SensorId == r.SensorId,
	as: (l, r) => {
		return {l with Pallet: r.Pallet} 
})