Join SQL and InfluxDB data

Hi,

I’m trying to join series in my Influx DB and data relative to series from an SQL DB.

import "sql"
import "join"

measuresInfos = sql.from(
    driverName: "sqlserver",
    dataSourceName: "sqlserver://user:psswd@ip:1433?database=Dev",
    query: "SELECT * FROM measures",
)

measuresData = from(bucket: "Test")
    |> range(start: -1h)
    |> filter(fn: (r) => r._measurement == "ActivePower")
    
join.left(
    left: measuresData,
    right: measuresInfos,
    on: (l, r) => l.measure_id == r.id,
    as: (l, r) => ({l with description: r.description, id :r.id}),
)

measureInfos correspond to :

|id | description|
|---| ---|
|8  | Air flow|
|9  | Glycoled water flow|
|10 | Temp In|
|11 | Temp out|
|12 | Global active power|

measureData correspond to :

|_time    | _value  | measure_id|
|---         | ---  |---|
| 2024-04-22 | 2066 | 12|
| 2024-04-22 | 2067 | 12|
| 2024-04-22 | 2066 | 12|
| 2024-04-22 | 2066 | 12|

The result is not what was expected, the column description and id are empty.

I would like to have the description of the measure for the series.

Any idea ?

Thanks
Stéphane

@steph2795 The query appears like it should work. You’ve confirmed that measureInfos is returning the data you’ve provided above?

The data in measureInfos and measureData are well as presented. (I cannot put images)

Is that because sql is row oriented and influx column oriented ?

What are the data types of the id in your two different data sets. I’m guessing in InfluxDB, measure_id is a tag, so it’s typed as a string. Is id in SQL Server also a string?

you’re right. In SQL ‘id’ is a int and in InfluxDB, ‘measure_id’ is a tag.

I used

map(fn: (r) => ({
    r with measure_id: string(v: r.id)
  }))

it creates the new column but still doesn’t do the join for the description…

It’s because the types have to match for the join predicate to resolve as true. You need to cast id to a string before you perform the join:

import "join"
import "sql"

measuresInfos =
    sql.from(
        driverName: "sqlserver",
        dataSourceName: "sqlserver://user:psswd@ip:1433?database=Dev",
        query: "SELECT * FROM measures",
    )
        |> map(fn: (r) => ({r with id: string(v: r.id)}))

measuresData =
    from(bucket: "Test")
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "ActivePower")

join.left(
    left: measuresData,
    right: measuresInfos,
    on: (l, r) => l.measure_id == r.id,
    as: (l, r) => ({l with description: r.description, id: r.id}),
)
1 Like

still nothing in description and id
:smiling_face_with_tear:

@steph2795 Can you provide some of the output from just the InfluxDB side of the join?

// measuresData
from(bucket: "Test")
    |> range(start: -1h)
    |> filter(fn: (r) => r._measurement == "ActivePower")

(still unable to put images)

MeasureData (measure_id and datatype are tags) :

+--------------------------------+--------------------------------+--------------------------+-----------+--------+--------------+----------+------------+
| _start                         | _stop                          | _time                    | _value    | _field | _measurement | datatype | measure_id |
+--------------------------------+--------------------------------+--------------------------+-----------+--------+--------------+----------+------------+
| 2024-04-14T13:48:05.430187948Z | 2024-04-24T13:48:05.430187948Z | 2024-04-23T06:50:49.729Z | 144.06561 | value  | ActivePower  | 0        | 12         |
+--------------------------------+--------------------------------+--------------------------+-----------+--------+--------------+----------+------------+
| 2024-04-14T13:48:05.430187948Z | 2024-04-24T13:48:05.430187948Z | 2024-04-23T06:50:50.227Z | 143.37552 | value  | ActivePower  | 0        | 12         |
+--------------------------------+--------------------------------+--------------------------+-----------+--------+--------------+----------+------------+
| 2024-04-14T13:48:05.430187948Z | 2024-04-24T13:48:05.430187948Z | 2024-04-23T06:50:50.728Z | 143.27747 | value  | ActivePower  | 0        | 12         |
+--------------------------------+--------------------------------+--------------------------+-----------+--------+--------------+----------+------------+
| 2024-04-14T13:48:05.430187948Z | 2024-04-24T13:48:05.430187948Z | 2024-04-23T06:50:51.227Z | 146.62433 | value  | ActivePower  | 0        | 12         |
+--------------------------------+--------------------------------+--------------------------+-----------+--------+--------------+----------+------------+
| 2024-04-14T13:48:05.430187948Z | 2024-04-24T13:48:05.430187948Z | 2024-04-23T06:50:51.729Z | 149.68327 | value  | ActivePower  | 0        | 12         |
+--------------------------------+--------------------------------+--------------------------+-----------+--------+--------------+----------+------------+

MeasureInfos:

+----+-----------------------+---------------+------------+
| id | description           | type          | input_type |
+----+-----------------------+---------------+------------+
| 8  | Air flow              | FlowRate      | 4          |
+----+-----------------------+---------------+------------+
| 9  | Glycoled water flow   | FlowRate      | 4          |
+----+-----------------------+---------------+------------+
| 10 | Temp IN               | Temperature   | 1          |
+----+-----------------------+---------------+------------+
| 11 | Temp OUT              | Temperature   | 1          |
+----+-----------------------+---------------+------------+
| 12 | Global active power   | ActivePower   | 4          |
+----+-----------------------+---------------+------------+
| 13 | Global reactive power | ReactivePower | 4          |
+----+-----------------------+---------------+------------+
| 14 | Global apparent power | ApparentPower | 4          |
+----+-----------------------+---------------+------------+

Using the two datasets above, I mocked up my own join and it works:

import "array"
import "join"

measuresData = array.from(rows: [
    {_start: 2024-04-14T13:48:05.430187948Z, _stop: 2024-04-24T13:48:05.430187948Z, _time: 2024-04-23T06:50:49.729Z, _value: 144.06561, _field: "value", _measurement: "ActivePower", datatype: "0", measure_id: "12"},
    {_start: 2024-04-14T13:48:05.430187948Z, _stop: 2024-04-24T13:48:05.430187948Z, _time: 2024-04-23T06:50:50.227Z, _value: 143.37552, _field: "value", _measurement: "ActivePower", datatype: "0", measure_id: "12"},
    {_start: 2024-04-14T13:48:05.430187948Z, _stop: 2024-04-24T13:48:05.430187948Z, _time: 2024-04-23T06:50:50.728Z, _value: 143.27747, _field: "value", _measurement: "ActivePower", datatype: "0", measure_id: "12"},
    {_start: 2024-04-14T13:48:05.430187948Z, _stop: 2024-04-24T13:48:05.430187948Z, _time: 2024-04-23T06:50:51.227Z, _value: 146.62433, _field: "value", _measurement: "ActivePower", datatype: "0", measure_id: "12"},
    {_start: 2024-04-14T13:48:05.430187948Z, _stop: 2024-04-24T13:48:05.430187948Z, _time: 2024-04-23T06:50:51.729Z, _value: 149.68327, _field: "value", _measurement: "ActivePower", datatype: "0", measure_id: "12"},
])

measuresInfos = array.from(rows: [
    {id: 8, description: "Air flow", type: "FlowRate", input_type: 4},
    {id: 9, description: "Glycoled water flow", type: "FlowRate", input_type: 4},
    {id: 10, description: "Temp IN", type: "Temperature", input_type: 1},
    {id: 11, description: "Temp OUT", type: "Temperature", input_type: 1},
    {id: 12, description: "Global active power", type: "ActivePower", input_type: 4},
    {id: 13, description: "Global reactive power", type: "ReactivePower", input_type: 4},
    {id: 14, description: "Global apparent power", type: "ApparentPower", input_type: 4},
])
    |> map(fn: (r) => ({r with id: string(v: r.id)}))

join.left(
    left: measuresData,
    right: measuresInfos,
    on: (l, r) => l.measure_id == r.id,
    as: (l, r) => ({l with description: r.description, id: r.id}),
)

Which returns the following (with the _start and _stop columns removed for brevity):

_field _measurement _time _value datatype description id measure_id
value ActivePower 2024-04-23T06:50:49.729Z 144.06561 Global active power 12 12
value ActivePower 2024-04-23T06:50:50.227Z 143.37552 Global active power 12 12
value ActivePower 2024-04-23T06:50:50.728Z 143.27747 Global active power 12 12
value ActivePower 2024-04-23T06:50:51.227Z 146.62433 Global active power 12 12
value ActivePower 2024-04-23T06:50:51.729Z 149.68327 Global active power 12 12

Assuming my datatypes are correct in measuresInfo, your join should work.

1 Like

Your code works for me as well.

With 2 arrays, the jointure is done.
With SQL and the array of measureData, the jointure is done.
With SQL and the request of influx, the jointure is not done.

measureInfos datatypes are strings :
image

measureData :

Is it because measure_id is group and id is no group ?

Yes! That’s exactly why. I had forgotten about that specific detail. When doing a left join, the group keys of the joined datasets need to be identical (as noted in here in step 2).

So try this:

import "join"
import "sql"

measuresInfos =
    sql.from(
        driverName: "sqlserver",
        dataSourceName: "sqlserver://user:psswd@ip:1433?database=Dev",
        query: "SELECT * FROM measures",
    )
        |> map(fn: (r) => ({r with id: string(v: r.id)}))

measuresData =
    from(bucket: "Test")
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "ActivePower")
        |> group()

join.left(
    left: measuresData,
    right: measuresInfos,
    on: (l, r) => l.measure_id == r.id,
    as: (l, r) => ({l with description: r.description, id: r.id}),
)

It works ! :slight_smile:

I had to drop the columns _start and _stop because it was still grouped.

measuresData =
    from(bucket: "Test")
        |> range(start: -1h)
        |> filter(fn: (r) => r._measurement == "ActivePower")
        |> drop(columns: ["_start", "_stop"])
        |> group()

Thanks.

1 Like