I’m storing CPU usage and memory usage at 1-second intervals with the same timestamp under the _field → cpu_usage and mem_usage fields. To find the CPU usage MAX and check the memory usage at the same timestamp, it seems that using pivot followed by CPU MAX would work. However, pivot significantly degrades performance when there is a lot of data. So instead of pivot, I think it would be better to find the timestamp with the same tag. Is it possible to do this with a single Flux query?
Hi @han,
Frorm what I understood, your line protocol looks something like this (sending cpu/memory with the same timestamp, possibly from more than one device):
test,device=xxx memory=10,cpu=30 1729627400
test,device=xxx memory=11,cpu=31 1729627401
test,device=xxx memory=12,cpu=32 1729627402
test,device=xxx memory=13,cpu=33 1729627403
test,device=yyy memory=10,cpu=30 1729627400
test,device=yyy memory=11,cpu=31 1729627401
test,device=yyy memory=12,cpu=32 1729627402
test,device=yyy memory=13,cpu=33 1729627403
With this query:
from(bucket: "test")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "test")
|> filter(fn: (r) => r._field =~ /cpu|memory/)
|> group(columns:["device","_field"])
|> aggregateWindow(every: 1h, fn: max, createEmpty: false )
I get this result (the max of memory and cpu every hour per device):
You can then pivot the data if you need to:
import "influxdata/influxdb/schema"
from(bucket: "test")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "test")
|> filter(fn: (r) => r._field =~ /cpu|memory/)
|> group(columns:["device","_field"])
|> aggregateWindow(every: 1h, fn: max, createEmpty: false )
|> schema.fieldsAsCols()
Does that help?
Thanks,
Tom
Hi @thopewell
In the following scenario:
test,device=xxx memory=10,cpu=40 1729627400
test,device=xxx memory=11,cpu=30 1729627401
test,device=xxx memory=12,cpu=20 1729627402
test,device=xxx memory=13,cpu=30 1729627403
The highest CPU usage is 40. In this case, the memory usage stored at 1729627400
is 10, and to check this memory usage, the following query can be used:
from(bucket: "test")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "test")
|> filter(fn: (r) => r._field == "cpu" or r._field == "memory")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> max(column: "cpu")
|> yield(name: "result")
I understand that this query returns the memory usage when the CPU usage is at its maximum. However, pivoting all the rows can lead to performance degradation. I believe it would be faster to first get the timestamp of the maximum CPU usage and then retrieve the memory usage that occurred at that timestamp. Is there a way to approach this?
Hi @han
You could try something like this, but I’m not sure if it is performant:
import "join"
data = from(bucket: "test")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "test")
|> filter(fn: (r) => r._field == "cpu" or r._field == "memory")
cpu = data
|> filter(fn: (r) => r._field == "cpu" )
|> max()
|> group()
memory = data
|> filter(fn: (r) => r._field == "memory" )
|> group()
join.time(
method: "left",
left: cpu, right: memory,
as: (l, r) => ({l with memory: r._value})
)
This gives me the desire result:
Thanks,
Tom