Hi @Anaisdg . Thank you, yes this was helpful but perhaps not in the way you thought. Here are my comments. issues and questions:
-
The query you provided runs fine from a syntax and runtime point of view but does not provide useful or expected output. Most of this is entirely my fault because I didn’t provide enough context or specs. More on this later. The code does exhibit unexpected behavior but I need to give you more context in order to explain.
-
When I ran the query you provided, I quickly realized that the simplified data I provided was hard to interpret from a validation point of view and I decided to migrate to a query using a small sample of real data. Here’s the modified query using real data:
// query to compute PV yield in EUR for the period between BOP and EOP, using date dependent rates
import "csv"
import "date"
import "math"
import "array"
import "timezone"
option location = timezone.location(name: "Europe/Amsterdam")
// period selected to include a rate change, a small dataset; uses actual data
BOP = 2022-06-30T23:58:00Z // begin of period
EOP = 2022-07-01T00:03:00Z // end of period
// eop column added; this should not be necessary but so far I can't figure out to use only the _time column
rates = array.from(rows: [ // construct rate table; timezone is UTC
{_time: 2022-01-31T23:00:00Z, eop: 2022-06-30T22:00:00Z, _measurement: "electricity", class: "FIXED", _value: -1.32934},
{_time: 2022-01-31T23:00:00Z, eop: 2022-06-30T22:00:00Z, _measurement: "electricity", class: "VARIABLE", _value: -0.45845},
{_time: 2022-06-30T22:00:00Z, eop: now(), _measurement: "electricity", class: "FIXED", _value: -1.19751},
{_time: 2022-06-30T22:00:00Z, eop: now(), _measurement: "electricity", class: "VARIABLE", _value: -0.45845},
])
|> pivot(rowKey:["_measurement", "_time", "eop"], columnKey: ["class"], valueColumn: "_value")
|> map(fn: (r) => ({ r with eop: date.truncate(t: r.eop, unit: 1s)})) // remove fractional seconds
|> yield(name: "rates")
counters = array.from(rows: [ // construct counters table; timezone is UTC
{_time: 2022-06-30T23:58:00Z, _measurement: "electricity", cT1: 12280.604, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-06-30T23:59:00Z, _measurement: "electricity", cT1: 12280.608, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-07-01T23:00:00Z, _measurement: "electricity", cT1: 12280.612, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-07-01T23:01:00Z, _measurement: "electricity", cT1: 12280.617, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
{_time: 2022-07-01T23:02:00Z, _measurement: "electricity", cT1: 12280.621, cT2: 7052.93, fT1: 14212.173, fT2: 31639.301},
])
|> yield(name: "counters")
rateStartTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[0]
// rateStopTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[1] // <--- NO OUTPUT
rateStopTime = (rates |> findColumn(fn: (key) => true, column: "eop"))[1] // <--- INCORRECT OUTPUT
ratesRow = rates
fixed = ratesRow
|> findColumn(fn: (key) => true, column: "FIXED")
variable = ratesRow
|> findColumn(fn: (key) => true, column: "VARIABLE")
out = counters
|> filter(fn: (r) => r._time >= rateStartTime and r._time <= rateStopTime)
|> difference(columns: ["cT1", "cT2", "fT1", "fT2"])
// |> map(fn: (r) => ({r with yld: fixed[0] - (r.cT1 + r.cT2 - r.fT1 - r.fT2) * variable[0]}))
|> map(fn: (r) => ({r with fixed: fixed[0]})) // SIMPLIFIED MAP FUNCTION TO VALIDATE PROGRAM LOGIC; should show rate change on July 1st
|> yield(name: "out")
-
The first thing you may notice running this query is that the output does not reflect the July 1st rate change. Please look at the “fixed” column which shows the same fixed rate for all dates: -1.32934
-
The second thing you may notice running this query is that the first line shown below - which is IMHO a faithful rendition of your code - does not provide any output at all. The second line where I changed “_time” into “eop” does provide output but not as desired.
// rateStopTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[1] // <--- NO OUTPUT
rateStopTime = (rates |> findColumn(fn: (key) => true, column: "eop"))[1] // <--- INCORRECT OUTPUT
I have been unsuccessful in addressing any of these problems (3 & 4).
- Let’s take a step back. I see essentially 2 approaches to calculate the total yield which is the query’s objective. In the first approach - which both you and @ypnos seem to have selected - the result is calculated for every date and the total yield is calculated by summing the yields for all dates. I got a version of this to work using @ypnos 's code but with the essential fix you provided in the line
rateStartTime = (rates |> findColumn(fn: (key) => true, column: "_time"))[0]
where @ypnos 's code resulted in the error I reported earlier. There are at least 2 problems with this approach: poor query performance and the necessary use of difference() to calculate the difference between 2 dates. A different approach and IMHO a cleaner approach is to retrieve the counters from the period start and end dates and substract these. This should give an excellent performance and accurate results.
Before asking my original question I wrote a query that uses this approach but using a single rate only. I have not been successful using a rate table with this query.
Do you see a way forward?
- All of this raises some fundamental questions. Am I asking too much of Flux? Is it reasonable to to expect to solve this problem in Flux as opposed to e.g. Python?
I’ve put some effort into explaining these issues as there’s very little information on similar issues to be found and others may benefit from this thread. IMHO, a well annotated lookup table example would be of great benefit to anyone searching for it.