Sliding window average for specific day of the week based on data from the last month

I have some utilization data in percentage in a bucket and want a single graph which on the X-axis has the hours of a single day (00:00 - 24:00) and this utilization on the Y-Axis.

The data for this should be the average of this utilization data for a specific day of the week (e.g. Sunday) from the last 30 days. The average resolution is not that important, but something like 15 minutes should work.

I’m basically stuck after this:

import "experimental"
import "date"

baseTime = v.timeRangeStart
timeStart = date.truncate(t: experimental.subDuration(d: 14d, from: now()), unit: 1d)
timeStop = date.truncate(t: experimental.addDuration(d: 1d, to: now()), unit: 1d)

utilization = from(bucket:"my-bucket") 
    |> range(start: timeStart, stop: timeStop)
    |> filter(fn: (r) => 
        r._measurement == "utilization" and r._field == "utilization%"
    )
    |> filter(fn: (r) => 
        date.weekDay(t: r._time) == 0
    )
    |> aggregateWindow(every: 15m, fn: mean)
    |> yield()

… because it just gives me the data for each Sunday aggregate with a mean for every 15min, but for the entire range() - but I would want it for the hours of a single day.

Any ideas or pointers to a possible solution?

@tuxtimo If I understand correctly, essentially what you need to do next is group results by hour and mean each hour. If that’s the case, you could do something like:

import "experimental"
import "date"

baseTime = v.timeRangeStart
timeStart = date.truncate(t: experimental.subDuration(d: 14d, from: now()), unit: 1d)
timeStop = date.truncate(t: experimental.addDuration(d: 1d, to: now()), unit: 1d)

utilization = from(bucket:"my-bucket") 
    |> range(start: timeStart, stop: timeStop)
    |> filter(fn: (r) => 
        r._measurement == "utilization" and r._field == "utilization%"
    )
    |> filter(fn: (r) => 
        date.weekDay(t: r._time) == 0
    )
    |> aggregateWindow(every: 15m, fn: mean)
    |> map(fn: (r) => ({ r with hour: date.hour(t: r._time) }))
    |> experimental.group(columns: ["hour"], mode: "extend")
    |> mean()
    |> sort(columns: ["hour"])

Thanks, this looks good so far, but I still have problems to display it properly, e.g. for the Graph plot I’m getting strange X-labeling and it’s not shown properly either. However, I’m choosing “hour” as the X-axis and “_value” as the Y-Axis:

This the raw data:

#group,false,false,true,true,true,true,true,true,false
#datatype,string,long,string,string,dateTime:RFC3339,dateTime:RFC3339,string,long,double
#default,_result,,,,,,,,
,result,table,_field,_measurement,_start,_stop,center,hour,_value
,,0,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,0,0
,,1,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,1,
,,2,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,2,
,,3,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,3,
,,4,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,4,
,,5,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,5,
,,6,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,6,
,,7,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,7,
,,8,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,8,
,,9,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,9,
,,10,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,10,
,,11,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,11,
,,12,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,12,17.136752136752136
,,13,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,13,14.861090839060187
,,14,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,14,12.630511463844794
,,15,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,15,11.098765432098766
,,16,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,16,4.950617283950617
,,17,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,17,4.444444444444444
,,18,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,18,4.444444444444444
,,19,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,19,4.444444444444444
,,20,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,20,4.444444444444444
,,21,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,21,0.5925925925925924
,,22,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,22,0
,,23,utilization%,utilization,2021-04-17T00:00:00Z,2021-05-02T00:00:00Z,Rotkreuz,23,0

Ok, so right now, every hour is split into its own table. You just need to “ungroup” (group by an empty column set) everything so it all is part of the same table and then it will graph as a single line:

// ...
  |> mean()
  |> group()
  |> sort(columns: ["host"])
  |> yield() 

Doesn’t seem to work:

#group,false,false,false,false,false,false,false,false,false
#datatype,string,long,string,string,dateTime:RFC3339,dateTime:RFC3339,string,long,double
#default,_result,,,,,,,,
,result,table,_field,_measurement,_start,_stop,center,hour,_value
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,0,0
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,1,0
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,2,0
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,3,0
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,4,0
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,5,0.13227513227513227
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,6,7.5760582010582
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,7,23.541666666666668
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,8,39.639550264550266
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,9,43.0489417989418
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,10,32.066798941798936
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,11,28.912037037037038
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,12,28.76560168226835
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,13,24.81976499624967
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,14,24.850308641975303
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,15,22.796406525573193
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,16,12.640652557319223
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,17,11.481481481481481
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,18,11.481481481481481
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,19,11.481481481481481
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,20,11.481481481481481
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,21,1.4537037037037037
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,22,0
,,0,utilization%,utilization,2021-04-21T00:00:00Z,2021-05-06T00:00:00Z,Rotkreuz,23,0

This seems like a UI bug. The X-Axis isn’t graphing correctly with a custom, non-time unit. You may consider submitting this as a bug on the InfluxDB UI repo.

I have a somewhat hacky workaround that will let you graph it as is. It uses the hour column to build a new timestamp that you can then use to graph:

import "date"
import "strings"

baseTime = date.truncate(t: v.timeRangeStart, unit: 1d)
timeStart = date.truncate(t: experimental.subDuration(d: 14d, from: now()), unit: 1d)
timeStop = date.truncate(t: experimental.addDuration(d: 1d, to: now()), unit: 1d)

replaceHour = (t, h) => {
    updated = strings.replace(v: string(v: t), t: "00:", u: "${string(v:h)}:", i: 1)
    return time(v: updated)
}
 
utilization = from(bucket:"my-bucket") 
    |> range(start: timeStart, stop: timeStop)
    |> filter(fn: (r) => 
        r._measurement == "utilization" and r._field == "utilization%"
    )
    |> filter(fn: (r) => 
        date.weekDay(t: r._time) == 0
    )
    |> aggregateWindow(every: 15m, fn: mean)
    |> map(fn: (r) => ({ r with hour: date.hour(t: r._time) }))
    |> experimental.group(columns: ["hour"], mode: "extend")
    |> mean()
    |> group()
    |> sort(columns: ["hour"])
    |> map(fn: (r) => ({ r with _time: replaceHour(t: baseTime, h: r.hour) }))

Great! That finally did the trick.

No I’ve tried to group by date.weekDay() (one graph line per weekday) instead of filtering for a specific week day, but this always screws up the later aggregation … any ideas on how to properly do that?

Sure. You can just use the same process to add a weekDay column and group by weekDay (after your previous aggregation). This will separate each day of the week into a different table/graph.

import "date"
import "strings"

baseTime = date.truncate(t: v.timeRangeStart, unit: 1d)
timeStart = date.truncate(t: experimental.subDuration(d: 14d, from: now()), unit: 1d)
timeStop = date.truncate(t: experimental.addDuration(d: 1d, to: now()), unit: 1d)

replaceHour = (t, h) => {
    updated = strings.replace(v: string(v: t), t: "00:", u: "${string(v:h)}:", i: 1)
    return time(v: updated)
}
 
utilization = from(bucket:"my-bucket") 
    |> range(start: timeStart, stop: timeStop)
    |> filter(fn: (r) => 
        r._measurement == "utilization" and r._field == "utilization%"
    )
    |> aggregateWindow(every: 15m, fn: mean)
    |> map(fn: (r) => ({ r with hour: date.hour(t: r._time) }))
    |> experimental.group(columns: ["hour"], mode: "extend")
    |> mean()
    |> group()
    |> sort(columns: ["hour"])
    |> map(fn: (r) => ({ r with _time: replaceHour(t: baseTime, h: r.hour) }))
    |> map(fn: (r) => ({ r with weekDay: date.weekDay(t: r._time) }))
    |> group(columns: ["weekDay"])

Note: You may need to sort the data again after grouping by weekday. Group doesn’t always retain the same sorting.