@scott @Anaisdg
Return to this problem, try one more time ask you community with much experience, how to optimize query:
This time on real example, maybe this will be more effective:
I store statistics of api call duration for 1Year, and I need to visualize this this data within this time range with possibility of simple search of anomality.
I choose to have at least 2 graphs (nonCahced & cachedOnCF). To each path (API method + route name) I need possibility to apply quantile (as simple search of anomality).
Also need to have possibility to have filter by path (if possible in same 2 graphs, if not - additional graph for 1 path).
Main problem - is that for all tags on big duration like 7d/30d it become very slow.
I tried to optimize through custom calculation windowPeriod, but still is to slow - 40s on query.
Maybe exist other option to optimize this solution or maybe exist better solution than I choose. If it so, please share.
For all cachedOnCF path
import "influxdata/influxdb/schema"
import "experimental/array"
import "date"
timestart = uint(v: date.time(t: v.timeRangeStart))
timestop = uint(v: date.time(t: v.timeRangeStop))
duration = uint(v: timestop - timestart)
windowPeriod =
if int(v:duration) >= int(v:1y) then 1mo
else if int(v:duration) >= int(v:30d) then 1d
else if int(v:duration) >= int(v:7d) then 90m
else if int(v:duration) >= int(v:2d) then 45m
else if int(v:duration) >= int(v:24h) then 30m //30s
else if int(v:duration) >= int(v:12h) then 15m //15s, 49
else if int(v:duration) >= int(v:6h) then 6m //11s, 61
else if int(v:duration) >= int(v:3h) then 3m //9s, 61
else if int(v:duration) >= int(v:1h) then 1m //4s, 61
else 1m
q = float(v:v.percentiles)
env = "production"
createEmpty = true
bucket = v.bucket
measurement = "user_timing"
field = "value"
cacheStatus = "cachedOnCF"
tag = "path"
path = v.ut_path
predicate = (r) =>
r["_measurement"] == measurement
and r["env"] == env
and r["_field"] == field
and r["status"] == cacheStatus
routePatterns = schema.tagValues(bucket: bucket, tag: tag, predicate: predicate, start: v.timeRangeStart, stop: v.timeRangeStop)
|> findColumn(fn: (key) => true, column: "_value")
data = () => from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: predicate)
getTagQuantileData = (tables=<-, tag) => tables
|> filter(fn: (r) => r["path"] == tag)
|> aggregateWindow(
column: "_value",
every: windowPeriod,
fn: (column, tables=<-) => tables |> quantile(q: q, column: column),
createEmpty: createEmpty
)
quantileData = routePatterns
|> array.map(fn: (x) => data() |> getTagQuantileData(tag: x))
union(tables:quantileData)
Variable v.ut_path conllect all unique tag value, but I need them for all period (even for 1 Year), and this also not very fast (maybe exist more fast way to collect it?)
import "influxdata/influxdb/schema"
import "experimental/array"
allPaths = schema.measurementTagValues(bucket: v.bucket, measurement: "user_timing", tag: "path")
allStream = array.from(rows: [{_value: "All"}])
pathsWithAll = union(tables: [allStream, allPaths])
pathsWithAll
We have possibility to have few query for same graph.
I wonder may be exist way to mount dynamically with smh like foreach tag unique value per query (if I have 1000 tag value path, then 1000 separate, but more simple queries),
possibility to process dynamically each tag value in separate query?
import "influxdata/influxdb/schema"
import "experimental/array"
import "date"
timestart = uint(v: date.time(t: v.timeRangeStart))
timestop = uint(v: date.time(t: v.timeRangeStop))
duration = uint(v: timestop - timestart)
windowPeriod =
if int(v:duration) >= int(v:1y) then 1mo
else if int(v:duration) >= int(v:30d) then 1d
else if int(v:duration) >= int(v:7d) then 90m
else if int(v:duration) >= int(v:2d) then 45m
else if int(v:duration) >= int(v:24h) then 30m
else if int(v:duration) >= int(v:12h) then 15m
else if int(v:duration) >= int(v:6h) then 6m
else if int(v:duration) >= int(v:3h) then 3m
else if int(v:duration) >= int(v:1h) then 1m
else 1m
q = float(v:v.percentiles)
env = "production"
createEmpty = true
bucket = v.bucket
measurement = "user_timing"
field = "value"
cacheStatus = "nonCached"
tag = "path"
path = "PUT:/users/:sessionId/cart/"
predicate = (r) =>
r["_measurement"] == measurement
and r["env"] == env
and r["_field"] == field
and r["status"] == cacheStatus
and r["path"] == path
data = () => from(bucket: v.bucket)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: predicate)
getTagQuantileData = (tables=<-, tag) => tables
|> filter(fn: (r) => r["path"] == tag)
|> aggregateWindow(
column: "_value",
every: windowPeriod,
fn: (column, tables=<-) => tables |> quantile(q: q, column: column),
createEmpty: createEmpty
)
getQuantileData = (tag) => {
return data() |> getTagQuantileData(tag: tag)
}
result = getQuantileData(tag: path)
result |> yield()