Extract 5 biggest current values for a metric, and consolidate everything else in "Others"

I’m trying to display an overview of a large number of series and display the five top ones and group everything else as a single “Other” table.

I don’t think there is a way to do that in Flux but I figure I would ask here. If only there was a way to convert a table to an array, that would probably save the day.

I did manage to get the biggest objects with the following :

from(bucket: "spaceview")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "volume")
  |> filter(fn: (r) => r["_field"] == "used")
  |> filter(fn: (r) => r["root"] == "false")
  |> filter(fn: (r) => r["styleEx"] == "flexvol")
  |> highestCurrent(n:5, groupColumns: ["volume"])

But I don’t see a way to use join or filter to use that to get the data for those volumes only, or every other volume only.

Well it seems I finally found the magic recipe, which seems to drop some column like _from which break the enforced grouping on _from/_to and makes for the expected join.

import "join"

import "join"
data = from(bucket: "spaceview")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "volume")
  |> filter(fn: (r) => r["_field"] == "used")
  |> filter(fn: (r) => r["root"] == "false")
  |> filter(fn: (r) => r["styleEx"] == "flexvol")
  |> drop(columns: ["_from"])
  |> group(columns: ["volume"])

annotated = from(bucket: "spaceview")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "volume")
  |> filter(fn: (r) => r["_field"] == "used")
  |> filter(fn: (r) => r["root"] == "false")
  |> filter(fn: (r) => r["styleEx"] == "flexvol")
  |> highestCurrent(n:5, groupColumns: ["volume"])
  |> set(key: "big", value: "true")
  |> group(columns: ["volume"])
  |> join.right(
    right: data,
    on: (l, r) => l.volume == r.volume,
    as: (l, r) => ({r with big: l.big}),
    )
	
annotated
  |> filter(fn: (r) => not exists r["big"])
  |> group(columns: ["_time"])
  |> sum()
  |> group()
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
  |> set(key: "volume", value:"other")
  |> yield(name: "other")
  
annotated
   |> filter(fn: (r) => exists r["big"])
   |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)
   |> yield(name: "big")

And the final query with a different method :

import "join"

data = from(bucket: "spaceview")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "volume")
  |> filter(fn: (r) => r["_field"] == "used")
  |> filter(fn: (r) => r["root"] == "false")
  |> filter(fn: (r) => r["styleEx"] == "flexvol")
  // Normalize time windows as each system report separately
  |> aggregateWindow(every: 1d, fn: sum, createEmpty: false)
  |> drop(columns: ["_start","_stop"])
  |> group(columns: ["volume","vserver"])
  |> sort(columns: ["_time"])

data
  |> range(start: -2w, stop: v.timeRangeStop)
  |> highestCurrent(n:5, groupColumns: ["volume","vserver"])
  |> set(key: "big", value: "true")
  |> group(columns: ["volume","vserver"])
  |> join.right(
    right: data,
    on: (l, r) => l.volume == r.volume and l.vserver == r.vserver,
    as: (l, r) => ({r with big: l.big}),
    )
  // When "big" is not defined, set the volume name and server name as "other"
  // for later sum()
  |> map(fn: (r) => ({
    r with
    volume:
        if not exists r.big then "other"
        else r.volume,
    vserver:
        if not exists r.big then "other"
        else r.vserver
        }))
    |> group(columns: ["_time","volume","vserver"])
    |> sum()
    |> group(columns: ["volume","vserver"])

@ybizeul Thank you for sharing your solution!!