Unions are slow

Structuring this stream of data as a function rather than a variable is the Flux equivalent of what some would call a “thunk.” It delays the planning of push-down functions through the invocation of the function.
For example, If I were to define the following variable, call that variable, and then follow it with other push-down’able functions:

exampleVar = 
    from(bucket: "example-bucket")
        |> range(start: -1d)
        |> filter(fn: (r) => r._measurement == "example-m")

exampleVar |> filter(fn: (r) => r._field == "example-f")

The push-downs would get scoped to the variable and wouldn’t continue on through to the next push-down’able function (filter by field). So Flux would have to load all the data into memory returned by the first filter before processing the 2nd filter.

By structuring the stream as a function, you delay the planning of push-downs until after the invocation of the function, which allows subsequent pushdowns to be utilized. For example:

exampleFn = () =>
    from(bucket: "example-bucket")
        |> range(start: -1d)
        |> filter(fn: (r) => r._measurement == "example-m")

exampleFn() |> filter(fn: (r) => r._field == "example-f")

This allows the planner to apply pushdowns past the invocation of the function, so both filters get pushed-down and only data returned from the 2nd filter needs to be loaded into memory.

1 Like