Hi, we are in the midst of migrating a query that generates a data export.
Our data is made up of a measurement that represent the customer, a single tag that represents the device the data is coming from and a number of fields (in the tenths) that represent several sensor and cumulative values.
An example would be:
11EB768FB98C4270B6D16BAEE17CF0DD,deviceId=78767 EM1_Energy=124642,EM1_Energy_Tariff1=0,EM1_Energy_Tariff2=0,EM1_Flow=361,EM1_Flow_Temperature=64.18,EM1_Power=8.1,EM1_Return_Temperature=44.44,EM1_Serial=224212,EM1_Uptime=3192,EM1_Volume=4237,HC1_FlowTemperature=34.64,HC1_FlowTemperature_Ref=11,HC1_ReturnTemperature=30.96,HC1_ReturnTemperature_Extra=192,HC1_ReturnTemperature_Ref=40,HC1_RoomTemperature=192,HC1_RoomTemperature_Limit=18,HC2_FlowTemperature=49.3,HC2_FlowTemperature_Ref=52,HC2_ReturnTemperature=192,HC2_ReturnTemperature_Extra=0.01,HC2_ReturnTemperature_Ref=50,Outdoor_Temperature=27.64,Primary1_FlowTemperature=192,Primary1_ReturnTemperature=192 1625492641215000000
Over this series, we run a query that performs a variety of aggregations over several of the fields for a subset of devices. The aggregations can be min, max, first, last, average and delta (the difference between the value of a field at the end of the period minus the value of a field at the beginning of the period.
The query in FluxQL could look like:
select Last("EM1_Energy") as "0",Mean("Outdoor_Temperature") as "1",(Last("EM1_Energy") - First("EM1_Energy")) AS "2" from "11111111111111111111111111111111" where ("deviceId" = '179474' or "deviceId" = '237129' or "deviceId" = '237140' or "deviceId" = '254415' or "deviceId" = '323126' or "deviceId" = '323127' or "deviceId" = '5312' or "deviceId" = '53753' or "deviceId" = '999000037') and time >= 1617228000000ms and time <= 1619819999999ms group by "deviceId"
What we could come up with is this, arguably much more complex, flux query to get the same data:
doRename = (t=<-, name= "") => t
|> map(fn: (r) => ({ r with _field: name }))
data = from(bucket: "development_readouts/high")
|> range(start: time(v: "2021-03-31T22:00:00.000Z"), stop: time(v: "2021-04-30T21:59:59.999Z"))
|> filter(fn: (r) => r._measurement == "11111111111111111111111111111111")
|> filter(fn: (r) => r.deviceId == "179474" or r.deviceId == "237129" or r.deviceId == "237140" or r.deviceId == "254415" or r.deviceId == "323126" or r.deviceId == "323127" or r.deviceId == "5312" or r.deviceId == "53753" or r.deviceId == "999000037")
t0 = data
|> filter(fn: (r) => r._field == "EM1_Energy")
|> last()
|> doRename(name: "0")
t1 = data
|> filter(fn: (r) => r._field == "Outdoor_Temperature")
|> mean()
|> doRename(name: "1")
priv2_1 = data
|> filter(fn: (r) => r._field == "EM1_Energy")
|> last()
priv2_2 = data
|> filter(fn: (r) => r._field == "EM1_Energy")
|> first()
t2 = union(tables: [priv2_2, priv2_1])
|> difference(nonNegative: false, columns: ["_value"])
|> doRename(name: "2")
union(tables: [t0, t1, t2])
|> pivot(
rowKey:["deviceId"],
columnKey: ["_field"],
valueColumn: "_value"
)
|> group(columns: ["deviceId"], mode:"by")
|> drop(columns: ["_start", "_stop", "_measurement"])
|> yield(name: "default")
Even though we get the same data from both queries, the performance of the flux query is waaaay worse and varies heavily by the number of columns and the kind of aggregation.
Both queries (influxQL and Flux) are executed against our InfluxCloud instance and measured from our application:
- 1 column:
InfluxQL 97ms (using MEAN agg) 93ms (using DELTA agg)
Flux 93ms (using MEAN agg) 997ms (using DELTA agg) - 2 columns
InfluxQL 100.5ms (using MEAN+FIRST agg) 108ms (using MEAN+DELTA agg)
Flux 948.1ms (using MEAN+FIRST agg) 1382ms (using MEAN+DELTA agg) - 3 columns
InfluxQL 116.25ms (using MEAN+FIRST+MAX agg) 98.39ms (using MEAN+FIRST+DELTA agg)
Flux 1325ms (using MEAN+FIRST+MAX agg) 1801ms (using MEAN+FIRST+DELTA agg)
As one can see the very moment we have more than one aggregation we get sensibly worse execution times with the flux query.
Being a “new”, more capable language, one can expect a little overhead, but 10x for such a small query is not acceptable.
Are we doing anything obviously wrong?
Thanks in advance for all the help