I am trying to create a check for load average per cpu where I take the loadaverage5 and divide it by the number of CPU’s.
This is so that I can get a per-cpu load average metric to alert on.
I have got the following filux query working in the data explorer
loadavg = from(bucket: "metrics_pg")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "system")
|> filter(fn: (r) => r["_field"] == "load15")
cpucount = from(bucket: "metrics_pg")
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r["_measurement"] == "system")
|> filter(fn: (r) => r["_field"] == "n_cpus")
join(
tables: {load:loadavg, n_cpus:cpucount},
on: ["_time", "_stop", "_start", "host"]
)
|> map(fn: (r) => ({
_time: r._time,
_value: r._value_load / float(v: + r._value_n_cpus),
host: r.host
})
)
|> aggregateWindow(every: 5m, fn: last)
|> yield(name: "last")
But when I try to create a threshold check using the HTTP API I get the following error:
$ curl --header "Authorization: Token ${TOKEN}" 127.0.0.1:9999/api/v2/checks -X POST -d @pg_cpu.json | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1768 100 113 100 1655 8692 124k --:--:-- --:--:-- --:--:-- 143k
{
"code": "invalid",
"message": "Could not create task from check: expected a single field but got: [load15 n_cpus]"
}
From the output it seems to me that the check is using the output from the first two queries where I fetch the loadavg
and cpucount
.
Here is the query
section of the json I use to create the check
"query": {
"text": "loadavg = from(bucket: \"metrics_pg\")\n |> range(start: v.timeRangeStart, stop: v.timeRangeStop)\n |> filter(fn: (r) => r[\"_measurement\"] == \"system\")\n |> filter(fn: (r) => r[\"_field\"] == \"load15\")\n\ncpucount = from(bucket: \"metrics_pg\")\n |> range(start: v.timeRangeStart, stop: v.timeRangeStop)\n |> filter(fn: (r) => r[\"_measurement\"] == \"system\")\n |> filter(fn: (r) => r[\"_field\"] == \"n_cpus\")\n\njoin(\n tables: {load:loadavg, n_cpus:cpucount},\n on: [\"_time\", \"_stop\", \"_start\", \"host\"]\n)\n |> map(fn: (r) => ({\n _time: r._time,\n _value: r._value_load / float(v: + r._value_n_cpus),\n host: r.host\n })\n )\n |> aggregateWindow(every: 5m, fn: last)\n |> yield(name: \"last\")\n",
"editMode": "advanced",
"name": ""
}
Any help will be really appreciated.