How can I return distinct tag set using InfluxQL

thopewell · November 11, 2023, 1:39am

Given the line protocol:

test,thing=a,version=x some_field=1 1699661173
test,thing=a,version=x some_field=2 1699661174
test,thing=a,version=y some_field=2 1699661175
test,thing=b,version=y some_field=2 1699661176
test,thing=b,version=y some_field=2 1699661177

Is it possible to return the unique combinations of “version” AND “thing” using InfluxQL?
I can do this in Flux using:

from(bucket: "testing")
  |> range(start: -1d )
  |> filter(fn: (r) => r["_measurement"] == "test")
  |> filter(fn: (r) => r["_field"] == "some_field")
  |> group(columns:["thing"])
  |> unique(column:"version")

returns the below; the rows with the same thing and version are removed:

,result,table,_start,_stop,_time,_value,_field,_measurement,thing,version
,_result,0,2023-11-10T00:23:10.268015781Z,2023-11-11T00:23:10.268015781Z,2023-11-11T00:06:13Z,1,some_field,test,a,x
,_result,0,2023-11-10T00:23:10.268015781Z,2023-11-11T00:23:10.268015781Z,2023-11-11T00:06:15Z,2,some_field,test,a,y
,_result,1,2023-11-10T00:23:10.268015781Z,2023-11-11T00:23:10.268015781Z,2023-11-11T00:06:16Z,2,some_field,test,b,y

I can’t seem to find an equivalent in InfluxQL using various combinations of DISTINCT, nested queries etc. Is there a way?
I can get all of the data and using pandas, get an equivalent result. But the problem is that there might by a significant amount of data - much of it redundant (same tag set) - so for performance reasons, I’m trying where possible to do the “heavy lifting” inside of Influx.

The answer to why I don’t I just use flux is a long one…

Thanks,
Tom

AVVS_Sudheer · November 11, 2023, 5:50am

Hi,

You can use distinct() function to return distinct values.

grant1 · November 11, 2023, 11:16pm

I think Tom mentions that he tried various combinations of this. Can you elaborate?

I’d like to hear it!

thopewell · November 15, 2023, 11:24pm

@AVVS_Sudheer @grant1

Thanks for the responses.
It took a bit of thinking to understand my own question, which perhaps lacked some context!

What I’m trying to do is detect whether there have been any changes to “version” for a particular “thing” (in real life, “thing” might be the mac address of a device, i.e. its unique).
If I can get the unique combinatons of tags “version” and “thing”, I can use a few lines of python with pandas to find such changes for a particular “thing”

Given the line protocol:

test,thing=a,version=x some_field=1 1700088202
test,thing=a,version=x some_field=2 1700088203
test,thing=a,version=y some_field=2 1700088204
test,thing=b,version=y some_field=2 1700088205
test,thing=b,version=y some_field=2 1700088206
test,thing=b,version=y some_field=2 1700088207
test,thing=b,version=y some_field=2 1700088208
test,thing=b,version=y some_field=2 1700088209

I can get this with flux which is what I need to progress - in real life there might be 10k rows per thing and many things, but Influx will only return 1 row per combinaton of tags:

_value,_field,_measurement,thing,version
1,some_field,test,a,x
2,some_field,test,a,y
2,some_field,test,b,y

I can see that thing “a” had a version change from x to y.

In InfluxQL, this query gets me all the data:

SELECT /some_field/,version,thing FROM test WHERE time > now() -2d GROUP BY thing,version

name,tags,time,some_field,version,thing
test,"thing=a,version=x",1700088202000000000,1,x,a
test,"thing=a,version=x",1700088203000000000,2,x,a
test,"thing=a,version=y",1700088204000000000,2,y,a
test,"thing=b,version=y",1700088205000000000,2,y,b
test,"thing=b,version=y",1700088206000000000,2,y,b
test,"thing=b,version=y",1700088207000000000,2,y,b
test,"thing=b,version=y",1700088208000000000,2,y,b
test,"thing=b,version=y",1700088209000000000,2,y,b

What I want to do now is effectively distinct the “tags” column, or distinct the “version” and “things” column - and I can’t figure this out, none of these work:

SELECT DISTINCT(version),DISTINCT(thing) FROM (SELECT /some_field/,version,thing FROM test WHERE time > now() -2d GROUP BY thing,version)
SELECT DISTINCT(version,thing) FROM (SELECT /some_field/,version,thing FROM test WHERE time > now() -2d GROUP BY thing,version)
SELECT DISTINCT(tags) FROM (SELECT /some_field/,version,thing FROM test WHERE time > now() -2d GROUP BY thing,version)

Anyway - thanks for reading this far!

My original issue with using flux was the influxdb3-python library doesn’t currently support asynchronus query, but I realised I can use aiohttp/asyncio and hit the compatability endpoint.
I haven’t had a chance yet to try out fluxpipe…

Thanks again,
Tom

Topic		Replies	Views
InfluxDB Flux - get count of distinct occurrences Fluxlang influxdb	4	8438	July 30, 2020
Trying to get Count Distinct across multiple columns Fluxlang influxdb , influxql , query	0	1445	March 22, 2019
Unique tag values in time interval InfluxDB 2	4	902	February 25, 2020
InfluxDB query to get distinct values from a table InfluxQL influxql	5	5382	July 2, 2021
Get distinct values of tag InfluxDB 3 influxdb	2	15	May 23, 2025

How can I return distinct tag set using InfluxQL

Related topics