Same flux logic returning different results

ajetsharwin · June 22, 2023, 4:45pm

Hi Team,

Would you agree that the below two scripts are expected to return the same results?

Script 1:

firstQuantity = from(bucket: “quantity/autogen”)
|> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
|> filter(fn: (r) => r._measurement == “quantity-items”)
|> filter(fn: (r) => r._field == “quantity”)
|> filter(fn: (r) => r.id == “A1”)
|> group(columns: [“id”, “_measurement”], mode:“by”)
|> first()
|> map(fn: (r) => ({ r with description: “first_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “firstQuantity”)

lastQuantity = from(bucket: “quantity/autogen”)
|> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
|> filter(fn: (r) => r._measurement == “quantity-items”)
|> filter(fn: (r) => r._field == “quantity”)
|> filter(fn: (r) => r.id == “A1”)
|> group(columns: [“id”, “_measurement”], mode:“by”)
|> last()
|> map(fn: (r) => ({ r with description: “last_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “last_quantity”)

Script 2:

data = from(bucket: “quantity/autogen”)
|> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
|> filter(fn: (r) => r._measurement == “quantity-items”)
|> filter(fn: (r) => r._field == “quantity”)
|> filter(fn: (r) => r.id == “A1”)
|> group(columns: [“id”, “_measurement”], mode:“by”)

firstQuantity = data
|> first()
|> map(fn: (r) => ({ r with description: “first_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “firstQuantity”)

lastQuantity = data
|> last()
|> map(fn: (r) => ({ r with description: “last_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “lastQuantity”)

I get different results for lastQuantity, which is I find super strange, my preference is to use Script 2 but that’s the one that’s giving unexpected results

tagging for attention @scott @Anaisdg

scott · June 22, 2023, 5:00pm

@ajetsharwin Not necessarily. The wildcard here is group(). It doesn’t guarantee the sort order of the output and first()/last() just take the first/last rows based on whatever sort order the input is in. You could add sort(columns: ["_time"]) after group(), but that will hurt performance. I’m wondering if you even need to regroup the data. Both id and _measurement are already in the group key. Are there other tags that you do not want to group by?

ajetsharwin · June 22, 2023, 5:38pm

Thanks for the reply Scott, I see.
Yes I also have a tag called “name” in the group key which I do not want in the group key.
In my data, for id = “A1”, name had taken the value “name1” until some period of time and then it changed to “name2” for the remaining period. As a result, firstQuantity and lastQuantity were resulting in two streams of tables for a given id before I called the group function. To cross this hurdle, I called the group function by passing “id” and “measurement” to ensure other tags are not included in the group key.

Given the above, I have two options:

Use Script 1 or use Script 2 with the addition of sort after the group function, anything else I should note/understand here?

scott · June 22, 2023, 7:37pm

@ajetsharwin I’m just curious if structuring the data variable as a function instead of a variable will make any difference. This changes the way pushdowns work with streams spread across identifiers/variables. It may not make a difference, but I’m curious if it does. I also removed the keep() function and updated the map() function to explicitly define the row schema. Since you’re already mapping over the data anyway, you might as well make one less function call.

data = () =>
    from(bucket: "risk_management_analytics/autogen")
        |> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
        |> filter(fn: (r) => r._measurement == "quantity-items")
        |> filter(fn: (r) => r._field == "quantity")
        |> filter(fn: (r) => r.id == "A1")
        |> group(columns: ["id", "_measurement"], mode: "by")

firstQuantity =
    data()
        |> first()
        |> map(fn: (r) => ({id: r.id, description: "first_quantity", _value: r._value}))
        |> yield(name: "firstQuantity")

lastQuantity =
    data()
        |> last()
        |> map(fn: (r) => ({id: r.id, description: "last_quantity", _value: r._value}))
        |> yield(name: "lastQuantity")

Topic		Replies	Views
Using Pivot in Flux on InfluxDB data - what am I doing wrong? Fluxlang influxdb , flux	6	1733	November 20, 2022
Yield Function Confusion Fluxlang influxdb , flux , yield	0	827	February 25, 2019
Flux and InfluxQL show different results Fluxlang influxdb , influxql , flux	6	778	February 3, 2021
Flux behaves differently on same data in different buckets Fluxlang influxdb , flux	5	596	May 27, 2021
Examples of flux queries InfluxDB 2	2	598	June 10, 2022

Same flux logic returning different results

Related topics