I intend to use fill with usePrevious on previously manipulated data. What I would like to do is pass some nulls into the table under a certain condition, to then run fill() function.
How do I pass a null literal?
I intend to use fill with usePrevious on previously manipulated data. What I would like to do is pass some nulls into the table under a certain condition, to then run fill() function.
How do I pass a null literal?
Hello @juliopereirab,
Welcome! Can you please provide an example input and desired output?
Hello @Anaisdg,
Here’s a sample of the code:
from(bucket: “00000_15cb1129-1137-4a92-b761-9fac2a5d0be9”)
|> range(start:2020-04-26T12:00:00Z, stop: 2020-04-26T15:00:00Z)
|> drop(columns: ["_start", “_stop”, “ChunkEnd”, “ProviderId”])
|> filter(fn: ® => r[“SourceId”] == “bookie1” or r[“SourceId”] == “bookie2”)
|> filter(fn: ® => r["_field"] == “price-000875918750000000”)
|> group()
|> sort(columns: ["_time"])
|> map(fn: ® => {
bookie1_value = if r.SourceId == “bookie1” then r._value else 0
bookie2_value = if r.SourceId == “bookie2” then r._value else 0
return {_time: r._time, bookie1: bookie1_value, bookie2: bookie2_value}
})
|>fill(column: “bookie1”, usePrevious: true)
|>fill(column: “bookie2”, usePrevious: true)
As you may see I’m producing two columns based on values with different tags. As I’m assigning values respectively to the new columns, I need a fallback value that would later be replaced with fill()/previous function. Currently I’m using 0 as fallback value, which doesn’t work because fill() as it expects null. Is there anyway pass null or solve this in some other way?.
Thanks,
I’m thinking there might be some other way to solve this. I’m still having trouble understanding. what you’re trying to do.
I don’t understand why you’re grouping only to then filter by those tags.
Can you help m understand why you can’t apply a usePrevious before grouping? It seems to me that way you wouldn’t have to “undo the group” by bookie1_value = if r.SourceId == “bookie1”
and bookie2_value = if r.SourceId == “bookie2”
.
To help me help you can you provide me with an exported annotated csv of your data, so that I can play with it? Can you also describe the goal generally/take a step back for me please?
I know the query might not be the most precise, although the main objective is to make a comparison of the results of the two bookies for each data point that get’s into the measurement/field. Now naturally, as a data point covers the data of a single bookie, and I want to make a comparison for the two bookmakers for each time a point gets into the database, I need a way of replicating the past values for comparison. Here is an example:
In the screenshot I’m passing there are three mini-tables, the first with the original data structure. I want to go from this state to having two columns to compare both values on each data entry. To do that I map through the values and generate the two columns, but for each row I have only one value, and I would need a sort of “fallback” value that would be detected by fill() to cover the gaps; that fallback value happens to be null, but I don’t know how to pass it.
I know there may be other ways of producing this result, but it seemed basic to be able to have null as literal. On the other hand I know I could make a script that would automate filling the gaps while receiving the data points from the source, but the first lazy approach was to see if that could be set into a plain query with what is there already.
Is there any suggestion?
Thanks,
I think you might be able to use pivot for this.
from(bucket: "test2")
|> range(start: 0, stop: 20)
|> filter(fn: (r) => r._measurement == "m0" and r._field == "price")
|> pivot(columnKey: ["SourceId"], rowKey: ["_time"], valueColumn: "_value")
|> fill(column: "bookie1", usePrevious: true)
|> fill(column: "bookie2", usePrevious: true)
This is because pivot performs what is essentially an outer join from a single stream of data by splitting that stream into multiple streams and joining them.
The ideal way to do this would likely be to use a join function. We have a better join function we have worked on, but I don’t recommend it for this because it only supports inner join right now and you really need outer join. I also think this could benefit from allowing fill
to operate on multiple columns or take a function to filter which rows. I’ve created an issue to address this so we can potentially make this easier.
There’s also this existing issue to have an outer join.
Hello @juliopereirab,
You can now perform full or outer joins with the following function:
You query would look like this:
import "array"
import "join"
data =
array.from(
rows: [
{_time: 2022-01-01T00:00:00Z, Price: 20, Bookmaker: "Bookie1"},
{_time: 2022-02-01T00:00:00Z, Price: 18, Bookmaker: "Bookie2"},
{_time: 2022-03-01T00:00:00Z, Price: 21, Bookmaker: "Bookie1"},
{_time: 2022-04-01T00:00:00Z, Price: 20, Bookmaker: "Bookie2"},
],
)
data
|> pivot(rowKey:["_time"], columnKey: ["Bookmaker"], valueColumn: "Price")
|> fill(column: "1", usePrevious: true)
|> fill(column: "2", usePrevious: true)
|> yield(name: "solution before joins")
left = data
|> filter(fn: (r) => r.Bookmaker == "Bookie1")
right = data
|> filter(fn: (r) => r.Bookmaker == "Bookie2")
join.full(
left: left,
right: right,
on: (l, r) => l._time == r._time,
as: (l, r) => {
time = if exists l._time then l._time else r._time
return {_time: time, Bookie1: l.Price, Bookie2: r.Price}
},
)
|> fill(column: "Bookie1", usePrevious: true)
|> fill(column: "Bookie2", usePrevious: true)
|> yield(name: "solution after joins")
@juliopereirab Flux 0.179.0 introduced debug.null()
which returns a null value of a specified type. It’s currently available in InfluxDB Cloud, InfluxDB 2.4, or InfluxDB nightly.
import "internal/debug"
debug.null(type: "string")
// Returns a null string
Hello @scott,
How would you suggest using it here? I don’t see it.
Thank you.
@Anaisdg Using the original query posted in this thread:
import "internal/debug"
from(bucket: "00000_15cb1129-1137-4a92-b761-9fac2a5d0be9")
|> range(start: 2020-04-26T12:00:00Z, stop: 2020-04-26T15:00:00Z)
|> filter(fn: (r) => r["SourceId"] == "bookie1" or r["SourceId"] == "bookie2")
|> filter(fn: (r) => r["_field"] == "price-000875918750000000")
|> group()
|> sort(columns: ["_time"])
|> drop(columns: ["_start", "_stop", "ChunkEnd", "ProviderId"])
|> map(
fn: (r) => {
bookie1_value = if r.SourceId == "bookie1" then r._value else debug.null(type: "int")
bookie2_value = if r.SourceId == "bookie2" then r._value else debug.null(type: "int")
return {_time: r._time, bookie1: bookie1_value, bookie2: bookie2_value}
},
)
|> fill(column: "bookie1", usePrevious: true)
|> fill(column: "bookie2", usePrevious: true)
That’s great, many thanks for sharing a solution. It’s nice to see some possible null that could be coupled with other steps to fill up the gaps.