Trying to implement lookup table; how to proceed

cville · July 2, 2022, 11:58am

Using the code below as an example, I’d like to lookup the value of FIXED and VARIABLE and use the result in a map function. For this I need to lookup the values in the provided rate table. Although I can get working code for the exact date of a rate change, I can’t get it to work for a date that falls between the dates in the rate table; e.g. 2022-06-29.

import "array"

rates = array.from(rows: [  // rates table
{_time: 2022-02-01, _measurement: "e", class: "FIXED",    _value: -2.0},
{_time: 2022-02-01, _measurement: "e", class: "VARIABLE", _value: -0.5}, 
{_time: 2022-07-01, _measurement: "e", class: "FIXED",    _value: -1.0},
{_time: 2022-07-01, _measurement: "e", class: "VARIABLE", _value: -0.5}, 
])
|> pivot(rowKey:["_time", "_measurement"], columnKey: ["class"], valueColumn: "_value")
|> yield(name: "rates")

counters = array.from(rows: [  // value table
{_time: 2022-06-29, _measurement: "e", c: 1.0, f: -2.0},
{_time: 2022-06-30, _measurement: "e", c: 3.0, f: -4.0},
{_time: 2022-07-01, _measurement: "e", c: 5.0, f: -6.0}, 
{_time: 2022-07-02, _measurement: "e", c: 7.0, f: -8.0},
])
|> yield(name: "counters")

// code below does not provide a meaningful result but map function shows intent
joined = join(tables: {rates: rates, counters: counters}, on: ["_measurement"], method: "inner")
  |> map(fn: (r) => ({ r with yld: r.FIXED + (r.c - r.f) * r.VARIABLE}))
  |> yield(name: "joined")

I have searched for lookup functionality in the context of InfluxDB but haven’t found any usable results. Does anyone know of a solution?

ypnos · July 3, 2022, 9:08pm

My suggestion. Instead of joining with “rates”, work on “counters” only, but then have a subquery-kindof-thing for “rates”.

So in the map statement, call a function, e.g.:

addYield = (row) => {
    ratesRow = rates|>filter(fn: (r) => r._time <= row._time)|>last()
    fixed = ratesRow|>findColumn(fn: (key) => key.class == "FIXED", column: "_value")
    variable = ratesRow|>findColumn(fn: (key) => key.class == "VARIABLE", column: "_value")
    return {r with yld: fixed[0] + (row.c - row.f) * variable[0]}
}

...
counters
  |> map(fn: addYield)

To recap: For each row of counters, find the best match in rates (here: the next-older entry) and fill-in from that row.

I just wrote that down, I didn’t test it, not even for syntax errors.

cville · July 4, 2022, 10:48am

ypnos:

addYield = (row) => {
    ratesRow = rates|>filter(fn: (r) => r._time <= row._time)|>last()
    fixed = ratesRow|>findColumn(fn: (key) => key.class == "FIXED", column: "_value")
    variable = ratesRow|>findColumn(fn: (key) => key.class == "VARIABLE", column: "_value")
    return {r with yld: fixed[0] + (row.c - row.f) * variable[0]}
}

...
counters
  |> map(fn: addYield)

Thank you; this looks like a promising approach. Unfortunately I failed to fix the listed syntax errors. Here’s the complete code, reformatted:

import "array"

rates = array.from(rows: [  // rates table
{_time: 2022-02-01, _measurement: "e", class: "FIXED",    _value: -2.0},
{_time: 2022-02-01, _measurement: "e", class: "VARIABLE", _value: -0.5}, 
{_time: 2022-07-01, _measurement: "e", class: "FIXED",    _value: -1.0},
{_time: 2022-07-01, _measurement: "e", class: "VARIABLE", _value: -0.5}, 
])
|> pivot(rowKey:["_time", "_measurement"], columnKey: ["class"], valueColumn: "_value")
|> yield(name: "rates")

counters = array.from(rows: [  // value table
{_time: 2022-06-29, _measurement: "e", c: 1.0, f: -2.0},
{_time: 2022-06-30, _measurement: "e", c: 3.0, f: -4.0},
{_time: 2022-07-01, _measurement: "e", c: 5.0, f: -6.0}, 
{_time: 2022-07-02, _measurement: "e", c: 7.0, f: -8.0},
])
|> yield(name: "counters")

addYield = (row) => {
  ratesRow = rates
    |>filter(fn: (r) => r._time <= row._time)
    |>last()
  fixed = ratesRow
    |>findColumn(fn: (key) => key.class == "FIXED", column: "_value")
  variable = ratesRow
    |>findColumn(fn: (key) => key.class == "VARIABLE", column: "_value")
  return {r with yld: fixed[0] + (row.c - row.f) * variable[0]}
}

counters
  |> map(fn: addYield)

Which generates the following errors:

error @28:11-28:12: undefined identifier r error @32:14-32:22: found unexpected argument row (argument fn) error @32:14-32:22: missing required argument r (argument fn)

I tried various solutions, e.g. changing “addYield” into “addYield()” but can’t fix the “undefined identifier r” error.

ypnos · July 4, 2022, 9:09pm

I’m sorry, I forgot about named parameter requirements, try this version:

addYield = (r) => {
  counterRow = r
  ratesRow = rates
    |>filter(fn: (r) => r._time <= counterRow._time)
    |>last()
  fixed = ratesRow
    |>findColumn(fn: (key) => key.class == "FIXED", column: "_value")
  variable = ratesRow
    |>findColumn(fn: (key) => key.class == "VARIABLE", column: "_value")
  return {r with yld: fixed[0] + (counterRow.c - counterRow.f) * variable[0]}
}

cville · July 5, 2022, 10:04am

I got the following error:

error calling function "map" @34:6-34:23: name {"key" ""} does not exist in scope

A search on Google came up empty.

cville · July 7, 2022, 12:36pm

@ypnos or anyone else: do you have a solution?

cville · July 9, 2022, 8:37pm

@Anaisdg Can you please take a look at this?

Anaisdg · July 11, 2022, 10:14pm

Hello @cville,
Thanks for tagging me.
Does this help?

import "array"

rates = array.from(rows: [  // rates table
{_time: 2022-02-01, _measurement: "e", class: "FIXED",    _value: -2.0},
{_time: 2022-02-01, _measurement: "e", class: "VARIABLE", _value: -0.5}, 
{_time: 2022-07-01, _measurement: "e", class: "FIXED",    _value: -1.0},
{_time: 2022-07-01, _measurement: "e", class: "VARIABLE", _value: -0.5}, 
])
|> pivot(rowKey:["_time", "_measurement"], columnKey: ["class"], valueColumn: "_value")
|> yield(name: "rates")

counters = array.from(rows: [  // value table
{_time: 2022-06-29, _measurement: "e", c: 1.0, f: -2.0},
{_time: 2022-06-30, _measurement: "e", c: 3.0, f: -4.0},
{_time: 2022-07-01, _measurement: "e", c: 5.0, f: -6.0}, 
{_time: 2022-07-02, _measurement: "e", c: 7.0, f: -8.0},
])
|> yield(name: "counters")

rateStartTime = (rates |>findColumn(fn: (key) => true, column: "_time"))[0]
rateStopTime = (rates |>findColumn(fn: (key) => true, column: "_time"))[1]

ratesRow = rates
|> yield(name: "ratesRow")
fixed = ratesRow
|>findColumn(fn: (key) => true, column: "FIXED")
variable = ratesRow
|>findColumn(fn: (key) => true, column: "VARIABLE")
out =  counters 
|> filter(fn: (r) => r._time >= rateStartTime and r._time <= rateStopTime)   
|> map(fn: (r) => ({r with yld: fixed[0] + (r.c - r.f) * variable[0]}))

out
|>yield(name: "out")

lmk

cville · July 16, 2022, 4:19pm

Hi @Anaisdg . Thank you, yes this was helpful but perhaps not in the way you thought. Here are my comments. issues and questions:

The query you provided runs fine from a syntax and runtime point of view but does not provide useful or expected output. Most of this is entirely my fault because I didn’t provide enough context or specs. More on this later. The code does exhibit unexpected behavior but I need to give you more context in order to explain.
When I ran the query you provided, I quickly realized that the simplified data I provided was hard to interpret from a validation point of view and I decided to migrate to a query using a small sample of real data. Here’s the modified query using real data:

// query to compute PV yield in EUR for the period between BOP and EOP, using date dependent rates

import "csv"
import "date"
import "math"
import "array"
import "timezone"
option location = timezone.location(name: "Europe/Amsterdam")

// period selected to include a rate change, a small dataset; uses actual data
BOP = 2022-06-30T23:58:00Z  // begin of period
EOP = 2022-07-01T00:03:00Z  //   end of period
       
// eop column added; this should not be necessary but so far I can't figure out to use only the _time column
rates = array.from(rows: [  // construct rate table; timezone is UTC
{_time: 2022-01-31T23:00:00Z, eop: 2022-06-30T22:00:00Z, _measurement: "electricity", class: "FIXED",    _value: -1.32934},
{_time: 2022-01-31T23:00:00Z, eop: 2022-06-30T22:00:00Z, _measurement: "electricity", class: "VARIABLE", _value: -0.45845}, 
{_time: 2022-06-30T22:00:00Z, eop: now(),                _measurement: "electricity", class: "FIXED",    _value: -1.19751},
{_time: 2022-06-30T22:00:00Z, eop: now(),                _measurement: "electricity", class: "VARIABLE", _value: -0.45845}, 
])
|> pivot(rowKey:["_measurement", "_time", "eop"], columnKey: ["class"], valueColumn: "_value")
|> map(fn: (r) => ({ r with eop: date.truncate(t: r.eop, unit: 1s)}))  // remove fractional seconds
|> yield(name: "rates")

counters = array.from(rows: [  // construct counters table; timezone is UTC
{_time: 2022-06-30T23:58:00Z, _measurement: "electricity", cT1: 12280.604, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-06-30T23:59:00Z, _measurement: "electricity", cT1: 12280.608, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-07-01T23:00:00Z, _measurement: "electricity", cT1: 12280.612, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-07-01T23:01:00Z, _measurement: "electricity", cT1: 12280.617, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-07-01T23:02:00Z, _measurement: "electricity", cT1: 12280.621, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
])
|> yield(name: "counters")

rateStartTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[0]
//  rateStopTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[1]  // <--- NO OUTPUT
 rateStopTime = (rates |> findColumn(fn: (key) => true, column: "eop"))[1]       // <--- INCORRECT OUTPUT

ratesRow = rates
fixed = ratesRow
  |> findColumn(fn: (key) => true, column: "FIXED")
variable = ratesRow
  |> findColumn(fn: (key) => true, column: "VARIABLE")
out = counters 
  |> filter(fn: (r) => r._time >= rateStartTime and r._time <= rateStopTime)   
  |> difference(columns: ["cT1", "cT2", "fT1", "fT2"])
//   |> map(fn: (r) => ({r with yld: fixed[0] - (r.cT1 + r.cT2 - r.fT1 - r.fT2) * variable[0]}))
  |> map(fn: (r) => ({r with fixed: fixed[0]}))  // SIMPLIFIED MAP FUNCTION TO VALIDATE PROGRAM LOGIC; should show rate change on July 1st
  |> yield(name: "out")

The first thing you may notice running this query is that the output does not reflect the July 1st rate change. Please look at the “fixed” column which shows the same fixed rate for all dates: -1.32934
The second thing you may notice running this query is that the first line shown below - which is IMHO a faithful rendition of your code - does not provide any output at all. The second line where I changed “_time” into “eop” does provide output but not as desired.

//  rateStopTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[1]  // <--- NO OUTPUT
 rateStopTime = (rates |> findColumn(fn: (key) => true, column: "eop"))[1]       // <--- INCORRECT OUTPUT

I have been unsuccessful in addressing any of these problems (3 & 4).

Let’s take a step back. I see essentially 2 approaches to calculate the total yield which is the query’s objective. In the first approach - which both you and @ypnos seem to have selected - the result is calculated for every date and the total yield is calculated by summing the yields for all dates. I got a version of this to work using @ypnos 's code but with the essential fix you provided in the line

rateStartTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[0]

where @ypnos 's code resulted in the error I reported earlier. There are at least 2 problems with this approach: poor query performance and the necessary use of difference() to calculate the difference between 2 dates. A different approach and IMHO a cleaner approach is to retrieve the counters from the period start and end dates and substract these. This should give an excellent performance and accurate results.

Before asking my original question I wrote a query that uses this approach but using a single rate only. I have not been successful using a rate table with this query.

Do you see a way forward?

All of this raises some fundamental questions. Am I asking too much of Flux? Is it reasonable to to expect to solve this problem in Flux as opposed to e.g. Python?

I’ve put some effort into explaining these issues as there’s very little information on similar issues to be found and others may benefit from this thread. IMHO, a well annotated lookup table example would be of great benefit to anyone searching for it.

Topic		Replies	Views
Calculate values in query Fluxlang	1	392	November 8, 2022
Is this possible in InfluxDB? (lookup value in an array) Fluxlang	9	729	December 15, 2022
Flux query help : function calls Fluxlang	3	1287	June 24, 2022
Problem with Map syntax Fluxlang	6	436	April 6, 2022
How to interpolate data source using flux? Fluxlang	2	463	September 13, 2022

Trying to implement lookup table; how to proceed

Related topics