Joining data without a common column

I am using FluxQL with influxdb 1.8 and I am attempting to correlate/join records from 2 different tables. The challenge in this case seems to be that these 2 tables have no columns with values in common. So far the closest I have gotten to what I’m looking for is using the map function to add a new column (let’s call it ‘join_column’) with a value of zero (0) for every row in both tables and then I was able to join on that column. However, if table_1 and table_2 each have 10 rows, then the join of these tables ends up with 100 rows. I think this is because ‘join_column’ contains all zeroes.

I am thinking that if there was a way I could use the values 1 to 10 in the join_column field (1 to 10 in table_1 and 1 to 10 in table_2) then when I joined the tables I would get exactly 10 rows of joined data. I have tried to find a way to do this and so far have been unsuccessful.

Ideas:

  1. write a function that increments a global variable and returns increasing values every time it is called. The trouble is, I couldn’t find a way to increment a global variable and my function returned the same number every time.
  2. get the row index somehow and use that as my values for join_column.

So far I haven’t been able to figure out a way to code either one of those in FluxQL. Any help would be greatly appreciated!

Thanks

Hello @Chris_King,
I don’t have that problem:

import "experimental/array"
table1 = array.from(rows: [{_time: 2020-01-01T00:00:00Z, _value: "foo", _measuremtn: "table1", join_column: 0},{_time: 2020-01-02T00:00:00Z, _value: "bar", _measuremtn: "table1", join_column: 0}])
table2 = array.from(rows: [{_time: 2020-01-01T00:00:00Z, _value: "foo", _measuremtn: "table2", join_column: 0},{_time: 2020-01-02T00:00:00Z, _value: "bar", _measuremtn: "table2", join_column: 0}])

join(tables: {key1: table1, key2: table2}, on: ["join_column", "_time"], method: "inner")

Each table in the script above has two rows and when I join on the join_column there are 2 rows total.
If table_1 and table_2 have 10 rows each then I expect the join to have 10 rows. If you join on multiple columns where there are duplicate values.

Does that help?

Or do you still think you need to perform a cumulative sum?

Hello and thanks for the reply,

The example join that you provided above made me realize what I think is one important difference from my situation: the time values themselves in my 2 initial result tables are different because they are coming from 2 different machines. So, if I include “_time” in my join statement as you have done above, I get no results (or I might get a few results if the times happen to match up).

What I am trying to accomplish is to build a result table showing a graph of the time differences (in seconds) between the records in the original 2 tables.

Something like this (where uint_time1 and uint_time_2 are the epoch times in nanoseconds from the original result tables converted to uints):
time1_time2_join = join(tables: {time1: time1, time2:time2}, on: [“join_column”])
|> map(fn: ® => ({
_time: r._time_time1,
time_diff: if r.uint_time_1 >= r.uint_time_2 then (r.uint_time_1 - r.uint_time_2) / uint(v: 1000000000) else uint(v: 0)
}))
|> yield(name: “joined_times”)

Unfortunately, this only works well if I limit my results from the original tables to 1 record each (again, because when join on just join_column I end up with the number of result rows being the product of the number of rows in the original 2 tables because the times themselves don’t match up).

I still think one solution in special case would be to find a way to add a column with an incrementing number to join on but I haven’t been able to find a way to create such a column.

Any other thoughts would be appreciated - thanks!

Hello @Chris_King ,
You can perform an incrementing value column with

|> map(fn: (r) => ({r with x: 1.0}))
|> cumulativeSum(columns: ["x"])

The above 2 lines with cumulativeSum solved this issue. I consider this closed - thanks very much!

1 Like