influxQL top query mixed with other aggregates

How does one do a TOP() query grouped by HOST when the field I’m querying is a counter style field and so only makes sense when aggregated by NON_NEGATIVE_DIFFERENCE(MEAN(“http_response.5xx”))?

I wish I could SELECT TOP(NON_NEGATIVE_DIFFERENCE(MEAN("http_response.5xx")), "host", 10) FROM "haproxy" WHERE... but that doesn’t work.

Hello @jrwren,
You can use the group by clause:

I’m struggling to do this, I think because my only interface to influx is via grafana and I don’t see how to mix an existing group by with a top query and non_negative_difference.

I have this query (using haproxy telegraf data) and I’d like a top 10 by host. I’m having a mental block I think from too many years of only postgresql.

SELECT NON_NEGATIVE_DIFFERENCE(MEDIAN(“http_response.5xx”)) FROM “haproxy” WHERE (“meta_datacenter” = ‘us-east-1’ AND “server” = ‘/run/reverseproxy/haproxy.sock’ AND “type”=‘frontend’ AND “proxy”=‘443_public_ssl_in’ ) AND $timeFilter GROUP BY time($Interval)

trying:

SELECT TOP( NON_NEGATIVE_DIFFERENCE(MEDIAN("http_response.5xx")), "host", 10) FROM "haproxy" WHERE ("meta_datacenter" = 'us-east-1' AND "server" = '/run/reverseproxy/haproxy.sock' AND "type"='frontend' AND "proxy"='443_public_ssl_in' ) AND $timeFilter GROUP BY host, time(30s)

Results: InfluxDB Error: expected first argument to be a field in top(), found non_negative_difference(median("http_response.5xx"))

You just need a subquery, something like the one below should do it

SELECT
	TOP("diff", "host", 10)
FROM (
	SELECT
		NON_NEGATIVE_DIFFERENCE(MEDIAN("http_response.5xx")) as "diff"
		, "host"
	FROM "haproxy"
	WHERE (
		"meta_datacenter" = 'us-east-1'
		AND "server" = '/run/reverseproxy/haproxy.sock'
		AND "type"='frontend'
		AND "proxy"='443_public_ssl_in' 
	) AND $timeFilter
	GROUP BY
		host
		,time(30s)
)

It took me a while to realize that what I’m asking doesn’t even make sense. This is time series data, so top 10 when?

I realized that TOP over time doesn’t make sense unless it is aggregated over time, so

SELECT TOP( "5xx", 5), "host" FROM (
  SELECT (MEDIAN("http_response.5xx")) as "5xx" 
  FROM "reverseproxy"
  WHERE ("sv"='FRONTEND' AND "proxy"='443_public_ssl_in' ) AND $timeFilter
  GROUP BY  "host" )

Does get me a list of top hosts, but I’d like to take that list of top hosts and then use them as part of a WHERE. In SQL I would use an IN clause SUBQUERY in the WHERE clause. How can I do this with influxdb?

SELECT NON_NEGATIVE_DIFFERENCE(MEDIAN("http_response.5xx"))
FROM "haproxy" WHERE ("sv"='FRONTEND' AND "proxy"='443_public_ssl_in'
  AND "host" IN (SELECT "host" FROM (<ABOVEQUERY>))
 ) AND $timeFilter GROUP BY time($Interval),"host"