Hello InfluxDB Community,
InfluxDB Version: 1.8
I am trying to calculate the 25th percentile of values across a list of fields. I have google extensively but have not found anything apparent. Now to be clear I am a novice so I could have seen it but not understood. Anyone have any thoughts they would be willing to share?
This is how I am getting the source data to begin with:
import "date"
import "strings"
import "experimental"
today = now()
current_week = date.week(t: today)
from(bucket: "telegraf-prod/1m")
|> range(start: -13w)
|> filter(fn: (r) => r._measurement == "httpjson_heroku_dynos.project-production")
|> keep(columns: ["_time", "_field", "_value", "type"])
|> filter(fn: (r) => r._field == "mode_quantity" and r.type == "ingress_worker")
|> aggregateWindow(every: 5m, fn: max)
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
|> map(fn: (r) => ({r with week: date.week(t: r._time) }))
|> map(fn: (r) => ({r with
weeks_ago:
if r.week == current_week then 0
else if r.week < current_week then current_week - r.week
else current_week + 52 - r.week
}))
|> map(fn: (r) => ({r with
_time: time(v: experimental.addDuration(d: duration(v: string(v: r.weeks_ago) + "w"), to: r._time))
}))
|> range(start: -1d)
|> pivot(rowKey:["_time"], columnKey: ["weeks_ago"], valueColumn: "mode_quantity")
I am not sure if this is messy, if so, and you have some suggestions, please let me know.
The goal of the query above is to overlap weeks so that I can get a history of our resource scaling. I want the percentile so that I can get a more informed minimum scale setting.
This is what the current return looks like:
So the query as it exists works as expected. Now I want to add a column that’s the 25th percentile for the row.
Note: In the end this will be put in Grafana, so If there is a way to make grafana give that to me; I am open to that as well.
ok, I changed my mind. No need to answer this. I decided to just get the min & max and then in Grafana I will use a match function to get the min - 1. I like that better then the p25. Here is where I got it to if anyone is searching for it.
import "date"
import "strings"
import "experimental"
today = now()
current_week = date.week(t: today)
scaling = from(bucket: "telegraf-prod/1m")
|> range(start: -13w)
|> filter(fn: (r) => r._measurement == "httpjson_heroku_dynos.tesseract-production")
|> keep(columns: ["_time", "_field", "_value", "type"])
|> filter(fn: (r) => r._field == "mode_quantity" and r.type == "ingress_worker")
|> aggregateWindow(every: 5m, fn: max)
overlapped = scaling
|> map(fn: (r) => ({r with week: date.week(t: r._time) }))
|> map(fn: (r) => ({r with
weeks_ago:
if r.week == current_week then 0
else if r.week < current_week then current_week - r.week
else current_week + 52 - r.week
}))
|> map(fn: (r) => ({r with
_time: time(v: experimental.addDuration(d: duration(v: string(v: r.weeks_ago) + "w"), to: r._time))
}))
|> drop(columns: ["week", "weeks_ago", "type"])
minimums = overlapped
|> set(key: "_field", value: "min")
|> aggregateWindow(every: 5m, fn: min)
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
maximums = overlapped
|> set(key: "_field", value: "max")
|> aggregateWindow(every: 5m, fn: max)
|> pivot(rowKey:["_time"], columnKey: ["_field"], valueColumn: "_value")
//percentile = overlapped
// |> drop(columns: ["_start", "_stop", "week", "weeks_ago", "_field"])
// |> aggregateWindow(
// column: "_value",
// every: 5m,
// fn: (column, tables=<-) => tables |> quantile(q: 0.25, column: column),
// )
join(tables: {min: minimums, max: maximums}, on: ["_time"])
|> range(start: -1d)
|> drop(columns: ["_start", "_start_min", "_start_max", "_stop", "_stop_min", "_stop_max"])
The section thats commented out would be what I needed at least to get it. I did get an error on the quantile part, but if you can get that to work you should get the percentile.
Hello @Bill_DeLuca,
Thanks for sharing your answer.
I’d recommend using the quantile function: