Aggregate "sum" from multiple series for Grafana graph panel

My data
I have a measurement that have the total memory of each VM on each host in megabytes (granularity of 30s). The memory of the VMs can be changed and also the host of the VMs can also be changed.

For example; I have two nodes and 4 VMs:
_time, host, vm, _value
2021-06-01 10:41:00 GMT+2, FPPW9406, VM01, 4096
2021-06-01 10:41:00 GMT+2, FPPW9406, VM02, 1024
2021-06-01 10:41:00 GMT+2, FPPW9407, VM03, 4096
2021-06-01 10:41:00 GMT+2, FPPW9407, VM04, 1024
2021-06-01 10:40:30 GMT+2, FPPW9406, VM01, 4096
2021-06-01 10:40:30 GMT+2, FPPW9406, VM02, 1024
2021-06-01 10:40:30 GMT+2, FPPW9407, VM03, 4096
2021-06-01 10:40:30 GMT+2, FPPW9407, VM04, 1024
2021-06-01 10:40:00 GMT+2, FPPW9406, VM01, 4096
2021-06-01 10:40:00 GMT+2, FPPW9406, VM02, 1024
2021-06-01 10:40:00 GMT+2, FPPW9407, VM03, 4096
2021-06-01 10:40:00 GMT+2, FPPW9407, VM04, 1024

What I want
I want to create a Grafana graph panel that shows one line.
The Line = Summation of all VMs memory over time (in a graph panel)

The raw data should looks like this:
_time, _value
2021-06-01 10:41:00 GMT+2, 10240
2021-06-01 10:40:30 GMT+2, 10240
2021-06-01 10:40:00 GMT+2, 10240

I tried a lot of queries but I found no solution for this and the aggregateWindow with fn = sum is not the solution.

I think you should drop “host”, group by “VM” and then use sum().
Something like

  |> drop(columns: ["host"])
  |> group(columns: ["vm"], mode:"by")
  |> sum()
  |> group()
  |> drop(columns: ["vm"])

where the last two line are somehow optional and it is up to you.
Thanks!

1 Like

Thanks for your help, but as I mentioned I search a Flux query for a graph panel and with your query I only get the sum and not the sum over time.

Please don’t delete this post.

Ok. You can do what I suggested with a window() command if you need.
Why isn’t the aggregatewindow with window size 30s not a viable solution?

We want this graph over a big timeframe (for example 90d) and the problem is with a window size of 30s there are to many data. The big advantage of aggregatewindow is → normally we can use a dynamic window size (for example in “mean”) and that prevents from display to many data, but because we want the sum in a graph we cannot use a dynamic window.

Ok, that is way different to what you described in the first post: it is not that the query is not possibile, but it is much more that the result is not appealing. It is much more you choice of the way you want to process your data then the flux query itself.

Anyway, you can aggregate twice: first you aggregate with sum every 30s, then you aggregate with mean and a dynamic window size. Of course the results will be somehow interpolated, but you cannot reduce the cardinality of the data without any compromise.
Or, you can use a movingaverage, …

Thanks for your answer. :slight_smile:
I tried with the two “aggregateWindow” (sum & mean), it works but I have two problems:

1. The query is very slow (46s for a 24h timeframe)
I think this is because I use multiple “aggregateWindow” and the first “aggregateWindow” have to calculate a lot of data.
→ Is there a way to make this query faster?

2. The query is not really "dynamic"
If the granularity/frequency of the data change, I have to change the query in each dashboard/panel.
→ Is there a way to write a more “dynamic” query?

My query

from(bucket: "telegraf")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "hyperv_dynamic_memory")
  |> filter(fn: (r) => r["_field"] == "Guest_Visible_Physical_Memory")
  |> group()
  |> aggregateWindow(every: 30s, fn: sum, createEmpty: false)
  |> aggregateWindow(every: v.windowPeriod, fn: mean, createEmpty: false)

Can you try to run

from(bucket: "telegraf")
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "hyperv_dynamic_memory")
  |> filter(fn: (r) => r["_field"] == "Guest_Visible_Physical_Memory")
  |> group()
  |> count()

and see what the count is? It is very strange to me that the query is so slow.

The fact that it is not really dynamic is due to you constraint: you need to sum the data at the sampling period they are recorded, so at some point you need to feed the query with the proper time period.
I do not see a reliable way to automatically detect the sampling frequency: you can make a query, calculate the frequency and then use that value in the first aggregatewindow, but it might be a little too much.

I run your query and the count is 7436505.

Ok I hope there is a solution (new function) in the future, for this.

7M data in 24h? Those are quite a few, in fact.
I guess you might try to change the shards of the database but I’m no expert for this. Maybe @Anaisdg can point you to the proper person/direction here.

I do not know whether a fuction to extimate the sampling frequency will ever be implemented as this is not something easy to guess reliably.

Thanks!