How to Calculate Percentile OR Quantile from list of fields?

Bill_DeLuca · October 19, 2022, 3:25pm

Hello InfluxDB Community,

InfluxDB Version: 1.8

I am trying to calculate the 25th percentile of values across a list of fields. I have google extensively but have not found anything apparent. Now to be clear I am a novice so I could have seen it but not understood. Anyone have any thoughts they would be willing to share?

This is how I am getting the source data to begin with:

import "date"
import "strings"
import "experimental"

today = now()
current_week = date.week(t: today)

from(bucket: "telegraf-prod/1m")
  |> range(start: -13w)
  |> filter(fn: (r) => r._measurement == "httpjson_heroku_dynos.project-production")
  |> keep(columns: ["_time", "_field", "_value", "type"])
  |> filter(fn: (r) => r._field == "mode_quantity" and r.type == "ingress_worker")
  |> aggregateWindow(every: 5m, fn: max)
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
  |> map(fn: (r) => ({r with week: date.week(t: r._time) }))
  |> map(fn: (r) => ({r with 
  	weeks_ago: 
    	if r.week == current_week then 0
        else if r.week < current_week then current_week - r.week
        else current_week + 52 - r.week 
  }))
  |> map(fn: (r) => ({r with 
  	_time: time(v: experimental.addDuration(d: duration(v: string(v: r.weeks_ago) + "w"), to: r._time)) 
  }))
  |> range(start: -1d)
  |> pivot(rowKey:["_time"], columnKey: ["weeks_ago"], valueColumn: "mode_quantity")

I am not sure if this is messy, if so, and you have some suggestions, please let me know.

The goal of the query above is to overlap weeks so that I can get a history of our resource scaling. I want the percentile so that I can get a more informed minimum scale setting.

This is what the current return looks like:

So the query as it exists works as expected. Now I want to add a column that’s the 25th percentile for the row.
Note: In the end this will be put in Grafana, so If there is a way to make grafana give that to me; I am open to that as well.

Bill_DeLuca · October 19, 2022, 9:16pm

ok, I changed my mind. No need to answer this. I decided to just get the min & max and then in Grafana I will use a match function to get the min - 1. I like that better then the p25. Here is where I got it to if anyone is searching for it.

import "date"
import "strings"
import "experimental"


today = now()
current_week = date.week(t: today)

scaling = from(bucket: "telegraf-prod/1m")
  |> range(start: -13w)
  |> filter(fn: (r) => r._measurement == "httpjson_heroku_dynos.tesseract-production")
  |> keep(columns: ["_time", "_field", "_value", "type"])
  |> filter(fn: (r) => r._field == "mode_quantity" and r.type == "ingress_worker")
  |> aggregateWindow(every: 5m, fn: max)

overlapped = scaling
  |> map(fn: (r) => ({r with week: date.week(t: r._time) }))
  |> map(fn: (r) => ({r with 
  	weeks_ago: 
    	if r.week == current_week then 0
        else if r.week < current_week then current_week - r.week
        else current_week + 52 - r.week 
  }))
  |> map(fn: (r) => ({r with 
    _time: time(v: experimental.addDuration(d: duration(v: string(v: r.weeks_ago) + "w"), to: r._time)) 
  }))
  |> drop(columns: ["week", "weeks_ago", "type"])
  
minimums = overlapped
  |> set(key: "_field", value: "min")
  |> aggregateWindow(every: 5m, fn: min)
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

maximums = overlapped
  |> set(key: "_field", value: "max")
  |> aggregateWindow(every: 5m, fn: max)
  |> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")

//percentile = overlapped
//  |> drop(columns: ["_start", "_stop", "week", "weeks_ago", "_field"])
//  |> aggregateWindow(
//        column: "_value",
//        every: 5m,
//        fn: (column, tables=<-) => tables |> quantile(q: 0.25, column: column),
//    )


join(tables: {min: minimums, max: maximums}, on: ["_time"])
  |> range(start: -1d)
  |> drop(columns: ["_start", "_start_min", "_start_max", "_stop", "_stop_min", "_stop_max"])

The section thats commented out would be what I needed at least to get it. I did get an error on the quantile part, but if you can get that to work you should get the percentile.

Anaisdg · October 23, 2022, 7:12pm

Hello @Bill_DeLuca,
Thanks for sharing your answer.
I’d recommend using the quantile function:

Topic		Replies	Views
InfluxQL to FLUX - Percentile InfluxDB 2 influxdb , flux	0	658	May 25, 2022
Trying to build a set of percentiles or quantiles in one query, possible bug when using aggregateWindow with a custom function? Fluxlang flux	2	641	November 25, 2022
Using Dynamic filtering in Filter function InfluxDB 2 query , flux	0	342	February 8, 2023
How to plot median and 95th percentile within same cell? InfluxDB 2 chronograf , flux	3	1943	August 7, 2020
Query data from json	2	2606	December 31, 2017

How to Calculate Percentile OR Quantile from list of fields?

Related topics