Same flux logic returning different results

Hi Team,

Would you agree that the below two scripts are expected to return the same results?

Script 1:

firstQuantity = from(bucket: “quantity/autogen”)
|> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
|> filter(fn: (r) => r._measurement == “quantity-items”)
|> filter(fn: (r) => r._field == “quantity”)
|> filter(fn: (r) => r.id == “A1”)
|> group(columns: [“id”, “_measurement”], mode:“by”)
|> first()
|> map(fn: (r) => ({ r with description: “first_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “firstQuantity”)

lastQuantity = from(bucket: “quantity/autogen”)
|> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
|> filter(fn: (r) => r._measurement == “quantity-items”)
|> filter(fn: (r) => r._field == “quantity”)
|> filter(fn: (r) => r.id == “A1”)
|> group(columns: [“id”, “_measurement”], mode:“by”)
|> last()
|> map(fn: (r) => ({ r with description: “last_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “last_quantity”)

Script 2:

data = from(bucket: “quantity/autogen”)
|> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
|> filter(fn: (r) => r._measurement == “quantity-items”)
|> filter(fn: (r) => r._field == “quantity”)
|> filter(fn: (r) => r.id == “A1”)
|> group(columns: [“id”, “_measurement”], mode:“by”)

firstQuantity = data
|> first()
|> map(fn: (r) => ({ r with description: “first_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “firstQuantity”)

lastQuantity = data
|> last()
|> map(fn: (r) => ({ r with description: “last_quantity” }))
|> keep(columns: [“id”,“description”,“_value”])
|> yield(name: “lastQuantity”)

I get different results for lastQuantity, which is I find super strange, my preference is to use Script 2 but that’s the one that’s giving unexpected results :confused:

tagging for attention @scott @Anaisdg :pray:

@ajetsharwin Not necessarily. The wildcard here is group(). It doesn’t guarantee the sort order of the output and first()/last() just take the first/last rows based on whatever sort order the input is in. You could add sort(columns: ["_time"]) after group(), but that will hurt performance. I’m wondering if you even need to regroup the data. Both id and _measurement are already in the group key. Are there other tags that you do not want to group by?

Thanks for the reply Scott, I see.
Yes I also have a tag called “name” in the group key which I do not want in the group key.
In my data, for id = “A1”, name had taken the value “name1” until some period of time and then it changed to “name2” for the remaining period. As a result, firstQuantity and lastQuantity were resulting in two streams of tables for a given id before I called the group function. To cross this hurdle, I called the group function by passing “id” and “measurement” to ensure other tags are not included in the group key.

Given the above, I have two options:

Use Script 1 or use Script 2 with the addition of sort after the group function, anything else I should note/understand here?

@ajetsharwin I’m just curious if structuring the data variable as a function instead of a variable will make any difference. This changes the way pushdowns work with streams spread across identifiers/variables. It may not make a difference, but I’m curious if it does. I also removed the keep() function and updated the map() function to explicitly define the row schema. Since you’re already mapping over the data anyway, you might as well make one less function call.

data = () =>
    from(bucket: "risk_management_analytics/autogen")
        |> range(start: 2022-12-31T15:00:00Z, stop: 2023-05-31T16:00:00Z)
        |> filter(fn: (r) => r._measurement == "quantity-items")
        |> filter(fn: (r) => r._field == "quantity")
        |> filter(fn: (r) => r.id == "A1")
        |> group(columns: ["id", "_measurement"], mode: "by")

firstQuantity =
    data()
        |> first()
        |> map(fn: (r) => ({id: r.id, description: "first_quantity", _value: r._value}))
        |> yield(name: "firstQuantity")

lastQuantity =
    data()
        |> last()
        |> map(fn: (r) => ({id: r.id, description: "last_quantity", _value: r._value}))
        |> yield(name: "lastQuantity")