Why is the date operation so slow?

I want to calculate electricity cost per hour. Because electricity prices vary by hour, I use map and date.hour function to multiply cost by dynamic price.

The query below takes 16 seconds:

import "date"
from(bucket: "energy")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "electricity")
  |> filter(fn: (r) => r["_field"] == "usage")
  |> aggregateWindow(every: 1h, fn: last, createEmpty: false, timeSrc: "_start")
  |> difference()
  |> map(fn: (r) => ({r with _value: if date.hour(t: r._time) > 9 then r._value * 0.54 else r._value * 0.87}))

Same time range, but replace date.hour with r._value. The query below only takes 1.6 seconds:

from(bucket: "energy")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "electricity")
  |> filter(fn: (r) => r["_field"] == "usage")
  |> aggregateWindow(every: 1h, fn: last, createEmpty: false, timeSrc: "_start")
  |> difference()
  |> map(fn: (r) => ({r with _value: if r._value > 9 then r._value * 0.54 else r._value * 0.87}))

The date operation is 10x slower than raw query.

@gerhard I wish I had a better explanation and solution, but it’s just the cumulative effect of running date.hour operation for every single input row. How large of a time range are you querying?

I agree that. I’ve found that other functions have the same effect.

For example, the function int(v: r._value), math.abs(x: r._value), or even a variable.

t = 3.0
from(bucket: "energy")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "electricity")
  |> filter(fn: (r) => r["_field"] == "usage")
  |> aggregateWindow(every: 1h, fn: last, createEmpty: false, timeSrc: "_start")
  |> difference()
  |> map(fn: (r) => ({r with
      _value: if r._value >= t then
        r._value * 0.54
      else
        r._value * 0.87
    }))
// The query takes 11 seconds.
// Replace `t` with inline const: 1.6 seconds.

Functions in map can significantly impact query performance on large datasets.

BTW, my time range is 3 years, resulting in 367934 records.