Optimizing Flux Query to Retrieve CPU MAX and Memory Usage Without Using Pivot

han · October 22, 2024, 6:19am

I’m storing CPU usage and memory usage at 1-second intervals with the same timestamp under the _field → cpu_usage and mem_usage fields. To find the CPU usage MAX and check the memory usage at the same timestamp, it seems that using pivot followed by CPU MAX would work. However, pivot significantly degrades performance when there is a lot of data. So instead of pivot, I think it would be better to find the timestamp with the same tag. Is it possible to do this with a single Flux query?

thopewell · October 22, 2024, 8:14pm

Hi @han,
Frorm what I understood, your line protocol looks something like this (sending cpu/memory with the same timestamp, possibly from more than one device):

test,device=xxx memory=10,cpu=30 1729627400
test,device=xxx memory=11,cpu=31 1729627401
test,device=xxx memory=12,cpu=32 1729627402
test,device=xxx memory=13,cpu=33 1729627403

test,device=yyy memory=10,cpu=30 1729627400
test,device=yyy memory=11,cpu=31 1729627401
test,device=yyy memory=12,cpu=32 1729627402
test,device=yyy memory=13,cpu=33 1729627403

With this query:

from(bucket: "test")
|> range(start: -1d) 
|> filter(fn: (r) => r._measurement == "test") 
|> filter(fn: (r) => r._field =~ /cpu|memory/)
|> group(columns:["device","_field"])
|> aggregateWindow(every: 1h, fn: max, createEmpty: false )

I get this result (the max of memory and cpu every hour per device):

You can then pivot the data if you need to:

import "influxdata/influxdb/schema" 
from(bucket: "test")
|> range(start: -1d) 
|> filter(fn: (r) => r._measurement == "test") 
|> filter(fn: (r) => r._field =~ /cpu|memory/)
|> group(columns:["device","_field"])
|> aggregateWindow(every: 1h, fn: max, createEmpty: false )
|> schema.fieldsAsCols()

Does that help?
Thanks,
Tom

han · October 24, 2024, 12:01am

Hi @thopewell

In the following scenario:

test,device=xxx memory=10,cpu=40 1729627400
test,device=xxx memory=11,cpu=30 1729627401
test,device=xxx memory=12,cpu=20 1729627402
test,device=xxx memory=13,cpu=30 1729627403

The highest CPU usage is 40. In this case, the memory usage stored at 1729627400 is 10, and to check this memory usage, the following query can be used:

from(bucket: "test")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "test")
|> filter(fn: (r) => r._field == "cpu" or r._field == "memory")
|> pivot(rowKey: ["_time"], columnKey: ["_field"], valueColumn: "_value")
|> max(column: "cpu")
|> yield(name: "result")

I understand that this query returns the memory usage when the CPU usage is at its maximum. However, pivoting all the rows can lead to performance degradation. I believe it would be faster to first get the timestamp of the maximum CPU usage and then retrieve the memory usage that occurred at that timestamp. Is there a way to approach this?

thopewell · October 24, 2024, 3:39pm

Hi @han
You could try something like this, but I’m not sure if it is performant:

import "join"

data = from(bucket: "test")
|> range(start: -1d)
|> filter(fn: (r) => r._measurement == "test")
|> filter(fn: (r) => r._field == "cpu" or r._field == "memory")

cpu = data 
  |> filter(fn: (r) => r._field == "cpu" )
  |> max()
  |> group()

memory = data
  |> filter(fn: (r) => r._field == "memory" )
  |> group()

join.time(
  method: "left", 
  left: cpu, right: memory, 
  as: (l, r) => ({l with memory: r._value})
)

This gives me the desire result:

Thanks,
Tom

Topic		Replies	Views
Flux query using pivot in InfluxDB 2.1 extremely slow - how to sum up values of 3 different tags and create 1 new column containing that sum? InfluxDB 2 influxdb , query , flux , performance	2	1674	May 4, 2022
Using Pivot in Flux on InfluxDB data - what am I doing wrong? Fluxlang influxdb , flux	6	1747	November 20, 2022
Efficient flux query to summarize measurements with same timestamp InfluxDB 2 influxdb , flux	2	2216	April 26, 2023
Optimize query with pivot Fluxlang	1	985	July 31, 2020
Down-sampling? multiple records, Min, Mean ,Max or pivoted data? InfluxDB 2	1	6	June 26, 2025

Optimizing Flux Query to Retrieve CPU MAX and Memory Usage Without Using Pivot

Related topics