Types library isType and isNumeric Performance

ticchioned · July 7, 2023, 1:07pm

While the new types library is pretty helpful, I was wondering if it could be made any more performant using this flux code. Without the isNumeric() function, this query runs in about .03s but with it it takes 1.2s. That sadly wont scale well for my application, so for the time being I don’t think I can use it. @scott @Anaisdg

Here is my flux:

import "date"
import "types"

offset = duration(v: string(v: uint(v: 2023-07-06T04:00:00.0000000Z) - uint(v: date.truncate(t: 2023-07-06T04:00:00.0000000Z, unit: 86400s))) + "ns")

MinFunction = (measurement, field) =>
from(bucket: "LPGen2")
|> range(start: 2023-07-06T04:00:00.0000000Z, stop: 2023-07-07T04:00:00.0000000Z)
|> filter(fn: (r) => r._measurement == measurement)
|> filter(fn: (r) => r._field == field)
|> filter(fn: (r) => types.isNumeric(v: r._value))
|> window(every: 86400s, offset: offset, createEmpty: true)
|> min()

DigitalSetFunction = (measurement) =>
from(bucket: "LPGen2")
|> range(start: 0, stop: 2023-07-07T04:00:00.0000000Z)
|> filter(fn: (r) => r._measurement == measurement)
|> filter(fn: (r) => r._field == "DigitalSet")
|> last()

union(tables: [MinFunction(measurement: "Test", field: "Value"), DigitalSetFunction(measurement: "Test")]) |> yield(name: "Test,,Value")
union(tables: [MinFunction(measurement: "Test", field: "Value"), DigitalSetFunction(measurement: "Test")]) |> yield(name: "Test,Plant=A,Value")
union(tables: [MinFunction(measurement: "Test1", field: "Value"), DigitalSetFunction(measurement: "Test1")]) |> yield(name: "Test1,,Value")
union(tables: [MinFunction(measurement: "Test10", field: "Value"), DigitalSetFunction(measurement: "Test10")]) |> yield(name: "Test10,,Value")
union(tables: [MinFunction(measurement: "Test2", field: "Value"), DigitalSetFunction(measurement: "Test2")]) |> yield(name: "Test2,,Value")
union(tables: [MinFunction(measurement: "Test3", field: "Value"), DigitalSetFunction(measurement: "Test3")]) |> yield(name: "Test3,,Value")
union(tables: [MinFunction(measurement: "Test4", field: "Value"), DigitalSetFunction(measurement: "Test4")]) |> yield(name: "Test4,,Value")
union(tables: [MinFunction(measurement: "Test5", field: "Value"), DigitalSetFunction(measurement: "Test5")]) |> yield(name: "Test5,,Value")
union(tables: [MinFunction(measurement: "Test6", field: "Value"), DigitalSetFunction(measurement: "Test6")]) |> yield(name: "Test6,,Value")
union(tables: [MinFunction(measurement: "Test7", field: "Value"), DigitalSetFunction(measurement: "Test7")]) |> yield(name: "Test7,,Value")
union(tables: [MinFunction(measurement: "Test8", field: "Value"), DigitalSetFunction(measurement: "Test8")]) |> yield(name: "Test8,,Value")
union(tables: [MinFunction(measurement: "Test9", field: "Value"), DigitalSetFunction(measurement: "Test9")]) |> yield(name: "Test9,,Value")
union(tables: [MinFunction(measurement: "TestAlt", field: "Value"), DigitalSetFunction(measurement: "TestAlt")]) |> yield(name: "TestAlt,,Value")
union(tables: [MinFunction(measurement: "TestDigAlt", field: "Value"), DigitalSetFunction(measurement: "TestDigAlt")]) |> yield(name: "TestDigAlt,,Value")
union(tables: [MinFunction(measurement: "Test3", field: "Description"), DigitalSetFunction(measurement: "Test3")]) |> yield(name: "Test3,,Description")

Anaisdg · July 13, 2023, 8:44pm

Hello @ticchioned,
Unfortunately performance enhancements to Flux is not engineerings focus right now with the roll out of InfluxDB 3.0.
I’m sorry it’s not working for you as you expect but I encourage you to possibly try out 3.0 and take advantage of DataFusion and Apache Arrow Flight SQL (under the hood) for querying and transporting large datasets (including pandas dataframes) over network interfaces.

memetb · August 9, 2023, 6:33pm

@ticchioned fyi: the reason your performance is low is because |> filter(fn: (r) => types.isNumeric(v: r._value)) is not a push-down query, meaning you are losing the aggregation at the backing database.

This means that the entire data set is being pulled into memory and then the aggregate is being run in memory.

I am assuming you are doing that filter because your call to min() is failing on string values.

Unfortunately, the only proper way of resolving this is to add a tag to the measurement at insertion time so that you can do a filter on that tag (filtering tags is a pushdown query).

Fyi, running basicstats at the telegraf level will correctly drop non-numeric values, so one option would be to run a very short aggregate at the telegraf level (something that windows every 15s the last value) and the filter by that tag.

I should add that this is a design problem. If the underlying database implementation doesn’t support running a minimum aggregate on strings, it will not won’t fix this issue to get a system that can move things faster: what you want to do is filter out values that aren’t min aggregateable.

Topic		Replies	Views
Flux query error => data type problems Fluxlang influxdb , chronograf , flux	1	5043	January 6, 2021
[URGENT] Flux vs InfluxQL Groupby Speed InfluxDB 2 flux	7	577	August 30, 2023
Flux performance compared to similar influxQL query Fluxlang performance	5	578	December 9, 2022
Type test in Flux? Fluxlang	2	512	September 21, 2020
Query with lots of fields takes orders of magnitude higher in flux than influxql Fluxlang	9	781	January 21, 2022

Types library isType and isNumeric Performance

Related topics