Sum by multiple tags and Group by time

Hi,

I have a data like this:

product_sold = 120, {customer_name = ural, product_name = x}
product_sold = 121, {customer_name = nil, product_name = y}
product_sold = 123, {customer_name = ural, product_name = y}
product_sold = 127, {customer_name = nil, product_name = a}
product_sold = 132, {customer_name = tuna, product_name = b}
product_sold = 134, {customer_name = tuna, product_name = a}

It’s a prometheus counter. Cumulative data, never goes down, always up.

  1. I want to plot how it changes over time and ignore the customer_name and product_name.
  2. I want to plot how it changes over time in a minute (group by 1m), for a all customers (group by customer_name)

SELECT non_negative_difference(sum("product_sold")) FROM "db" WHERE $timeFilter GROUP BY time(1m) fill(null)

SELECT non_negative_difference(sum("product_sold")) FROM "db" WHERE $timeFilter GROUP BY time(1m), "customer_name" fill(null)

So, what I expect is that first query should sum all “product_sold” values in a time range in this case 1m.
Second query should do the same but for each customer and should return 3 values per minute since there are 3 customers in the example.

Can you help to query these two examples? I think, I did fine but the plot on grafana seems a bit off.


Another similar case:

I’m dealing with data called jvm_memory_bytes_used. This is a gauge metric and shows used memory. Has a tag called {area=“heap”, “nonheap”}

I want to sum this data by tag area. However, I don’t want to sum it over time. I only need to sum jvm_memory_bytes_used with same date. Of course timestamps will not be exactly same. So, I used subqueries.

SELECT percentile("Memory Used Raw",95) AS "Memory Used" FROM (SELECT sum("jvm_memory_bytes_used") AS "Memory Used Raw" FROM "db" WHERE $timeFilter GROUP BY time(10s) fill(previous)) GROUP BY time($sampling_interval)

Is this an overkill? Please, share if there is a better/efficient way to write this query.

Hi @Mert,

I think your fill(null) should be fill(previous) to fix your first examples.

Can you confirm what doesn’t seem right? Is it drops in the graph, or something else? A screenshot would be ideal.

Exact query and screenshot.

SELECT non_negative_difference(sum("jira_user_login_count")) AS "Login", non_negative_difference(sum("jira_user_logout_count")) AS "Logout" FROM "atlas" WHERE $timeFilter GROUP BY time(30m) fill(null)

Example data:

jira_user_login_count{username="x",} 3.0
jira_user_login_count{username="y",} 4.0
jira_user_login_count{username="z",} 1.0
jira_user_login_count{username="t",} 2.0

This seems right but it’s a far complicated query. I want to know if this the right way or an unnecessary complicated query?

SELECT sum("Login") FROM (SELECT non_negative_difference(sum("jira_user_login_count")) AS "Login" FROM "atlas" WHERE $timeFilter GROUP BY time(10s)) GROUP BY time(30m)

I believe this is how the query works:

  • sum all logins in 10s (since we collect data every 10s. It only sums the data with different tags[username], but doesn’t add up for a wide time range)
  • difference 10s, returns the difference between each 10s period.
  • lastly, the outer query sums the data to display easily for eyes.

This is how it should look like:

SELECT sum("Login in 10s") AS "Login", sum("Logout in 10s") AS "Logout" FROM (SELECT non_negative_difference(sum("jira_user_login_count")) AS "Login in 10s", non_negative_difference(sum("jira_user_logout_count")) AS "Logout in 10s" FROM "atlas" WHERE $timeFilter GROUP BY time(10s)) GROUP BY time($sampling_interval)

However, the query is pretty expensive.

(sum(rate(confluence_request_duration_on_path_sum{job="confluence"}[1m])) by (path)) / (sum(rate(confluence_request_duration_on_path_count{job="confluence"}[1m])) by (path))

May I ask the equivalent of promql above in influxql?

@daniel @rawkode Could you please enlighten me when you are free