influxQL top query mixed with other aggregates

jrwren · December 13, 2021, 10:37pm

How does one do a TOP() query grouped by HOST when the field I’m querying is a counter style field and so only makes sense when aggregated by NON_NEGATIVE_DIFFERENCE(MEAN(“http_response.5xx”))?

I wish I could SELECT TOP(NON_NEGATIVE_DIFFERENCE(MEAN("http_response.5xx")), "host", 10) FROM "haproxy" WHERE... but that doesn’t work.

Anaisdg · December 23, 2021, 9:15pm

Hello @jrwren,
You can use the group by clause:

jrwren · January 10, 2022, 8:45pm

I’m struggling to do this, I think because my only interface to influx is via grafana and I don’t see how to mix an existing group by with a top query and non_negative_difference.

I have this query (using haproxy telegraf data) and I’d like a top 10 by host. I’m having a mental block I think from too many years of only postgresql.

SELECT NON_NEGATIVE_DIFFERENCE(MEDIAN(“http_response.5xx”)) FROM “haproxy” WHERE (“meta_datacenter” = ‘us-east-1’ AND “server” = ‘/run/reverseproxy/haproxy.sock’ AND “type”=‘frontend’ AND “proxy”=‘443_public_ssl_in’ ) AND $timeFilter GROUP BY time($Interval)

trying:

SELECT TOP( NON_NEGATIVE_DIFFERENCE(MEDIAN("http_response.5xx")), "host", 10) FROM "haproxy" WHERE ("meta_datacenter" = 'us-east-1' AND "server" = '/run/reverseproxy/haproxy.sock' AND "type"='frontend' AND "proxy"='443_public_ssl_in' ) AND $timeFilter GROUP BY host, time(30s)

Results: InfluxDB Error: expected first argument to be a field in top(), found non_negative_difference(median("http_response.5xx"))

Giovanni_Luisotto · January 12, 2022, 1:24pm

You just need a subquery, something like the one below should do it

SELECT
	TOP("diff", "host", 10)
FROM (
	SELECT
		NON_NEGATIVE_DIFFERENCE(MEDIAN("http_response.5xx")) as "diff"
		, "host"
	FROM "haproxy"
	WHERE (
		"meta_datacenter" = 'us-east-1'
		AND "server" = '/run/reverseproxy/haproxy.sock'
		AND "type"='frontend'
		AND "proxy"='443_public_ssl_in' 
	) AND $timeFilter
	GROUP BY
		host
		,time(30s)
)

jrwren · January 19, 2022, 2:07pm

It took me a while to realize that what I’m asking doesn’t even make sense. This is time series data, so top 10 when?

jrwren · February 4, 2022, 9:42pm

I realized that TOP over time doesn’t make sense unless it is aggregated over time, so

SELECT TOP( "5xx", 5), "host" FROM (
  SELECT (MEDIAN("http_response.5xx")) as "5xx" 
  FROM "reverseproxy"
  WHERE ("sv"='FRONTEND' AND "proxy"='443_public_ssl_in' ) AND $timeFilter
  GROUP BY  "host" )

Does get me a list of top hosts, but I’d like to take that list of top hosts and then use them as part of a WHERE. In SQL I would use an IN clause SUBQUERY in the WHERE clause. How can I do this with influxdb?

SELECT NON_NEGATIVE_DIFFERENCE(MEDIAN("http_response.5xx"))
FROM "haproxy" WHERE ("sv"='FRONTEND' AND "proxy"='443_public_ssl_in'
  AND "host" IN (SELECT "host" FROM (<ABOVEQUERY>))
 ) AND $timeFilter GROUP BY time($Interval),"host"

Topic		Replies	Views
Influxdb subquery to count top X	14	10963	August 11, 2017
How to show top 10 hosts by cpu Store influxdb	16	12092	September 25, 2020
InfluxQL Select top N per timeperiod Kapacitor influxql	6	7357	April 1, 2019
Influxdb 1.2.2 subquery for top X influxdb , influxql	1	921	December 18, 2017
Data from not showing from select influxql	2	828	August 2, 2017

influxQL top query mixed with other aggregates

Related topics