Query to count occurrences of distinct values

ashinde · February 12, 2018, 3:52am

Hi,

I could not find the way to build a single query for getting the count of occurrences of distinct values.

select distinct(allocationLimit) from resources
name: resources
time distinct

0 54636
0 13220
0 0
0 14098

select count(allocationLimit) from resources where allocationLimit=54636
name: resources
time count

0 2

Like in the above example, how can I get the occurrences of all distinct values of “allocationLimit” in single query with output like:

Count distinct

2 54636
1 13220
4 0
5 14098

Any idea on how to do this?

Thanks,
Arti

Bugsyaya · February 12, 2018, 8:57am

Hey,

Have you tried this request ?

SELECT count(allocationLimit) FROM resources GROUP BY allocationLimit

Bugs’

afraley · February 21, 2018, 4:24pm

Did you figure this out? GROUP BY doesn’t work on fields, and in my case I also need to GROUP BY a tag.

I’m storing data about computers. One field I have is the OS version, which I don’t want to store as a TAG since it will change often. I tag by serial number so my series will look like this:

tags { serial: x1234 }
fields { os_version: “Mac OSX 10.12.5” }

I can do this to get a list of distinct versions, I want the last known value.
select distinct(“last”) from (SELECT last(“os_version”) FROM “computers” GROUP BY serial)

name: computers
time distinct

0 Mac\ OS\ X\ 10.12.6\ (16G1114)
0 Mac\ OS\ X\ 10.12.6\ (16G1036)
0 Mac\ OS\ X\ 10.13.2\ (17C205)
0 Mac\ OS\ X\ 10.12.6\ (16G29)

But how do I count the number of each distinct version? Like OP, I want:

name: compuers
count distinct

200 Mac\ OS\ X\ 10.12.6\ (16G1114)
450 Mac\ OS\ X\ 10.12.6\ (16G1036)
100 Mac\ OS\ X\ 10.13.2\ (17C205)
600 Mac\ OS\ X\ 10.12.6\ (16G29)

ashinde · February 22, 2018, 4:32am

No I haven’t found any solution yet as GROUP BY supports only tags and time

Anaisdg · May 14, 2019, 6:19pm

Hello @ashinde,

I was able to do this in flux with the following script:

   from(bucket: "test")
      |> range(start: dashboardTime)
      |> filter(fn: (r) => r._field == "value")
      |> group(columns: ["_value"])
      |> map(fn: (r) => ({_time: r._time, index: 1}))
      |> cumulativeSum(columns: ["index"]) 
      |> last()

Where my data in line protocol looks like:

test,tagkey=occ, value=1
test,tagkey=occ, value=5
test,tagkey=occ, value=5
test,tagkey=occ, value=1
test,tagkey=occ, value=1

This feels a little hacky, but hopefully, it will carry you through while I look for a more elegant solution.

If you decide to explore writing more flux scripts, the spec and the testdata repo will be your best resource.

StuartH · July 28, 2021, 12:06pm

Hi,

Did you find a more elegant solution?
Thanks,
Stuart

Anaisdg · July 30, 2021, 6:08pm

Hello @StuartH,
yes you can do:

   from(bucket: "test")
      |> range(start: dashboardTime)
      |> filter(fn: (r) => r._field == "value")
      |> group(columns: ["_value"])
      |> count()

I don’t know why I didn’t think of that previously.

kawada · August 4, 2021, 11:23am

Hi. I am new to this. It is not easy. I find the documentation a bit scarse (count() function | Flux 0.x Documentation)

Having a table displayed like this:

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "suricata-alerts")
  |> filter(fn: (r) => r["_field"] == "alert_metadata_signature_severity_0")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> group(columns: ["_value]"])

Which gives me (some columns have been omitted):

_time	_value	_event_type
2021-08-04 10:17:00 GMT+2	Major	alert
2021-08-04 10:14:00 GMT+2	Major	alert
2021-08-04 10:10:00 GMT+2	Minor	alert

How can I display the table like this:

_value	Count
Major	2
Minor	1

Adding |> count() to the script above, only gives me a column named _value and the number 3.

Anaisdg · August 4, 2021, 9:42pm

Hello @kawada,
I agree, it’s not easy to wrap your head around at first.

If you group by “_value” then you should return two tables in your stream IF you have two different values in the “_value” column for that time range. Can you confirm that that’s true?

Can you share the a screen shot or annotated csv output for your query above please?

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "suricata-alerts")
  |> filter(fn: (r) => r["_field"] == "alert_metadata_signature_severity_0")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> group(columns: ["_value]"])

For example you can try:

import "csv"

csvData = "#group,false,false,true,true,false,false,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string
#default,mean,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,location
,,0,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T22:00:00Z,79.75,degrees,average_temperature,coyote_creek
,,0,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T23:00:00Z,79.8,degrees,average_temperature,coyote_creek
,,1,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T22:00:00Z,76.25,degrees,average_temperature,santa_monica
,,1,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T23:00:00Z,79.2,degrees,average_temperature,santa_monica"

mydata =  csv.from(csv: csvData)
mydata
|> yield(name: "original data with two tables")

mydata
|> count() 
|> yield(name: "after count")

Where “original data with two tables” is:

And “after count” is:

Since original data had two output tables (you’ll achieve this by grouping by "_value ") after we apply count(), the count() function is applied to both tables in the table stream, giving us the two counts. Now the value for the counts is in the “_value” column by default. I recommend keeping int values in the “_value”. However you can use the rename() function to change the output to what you want exactly.

kawada · August 5, 2021, 10:35pm

Thank you for your reply, @Anaisdg

Shown in the screenshot below is “Major” and “Audit”, but there are also events of “Minor”:

With count():

I would like to achieve to have a table with the total number of “Major” events, the total number of “Minor” events, etc.

Anaisdg · August 9, 2021, 7:49pm

Hello @kawada,
Do you mind sharing the first screen shot again but with the “raw data” view turned on?
Can you also share the output before the group() is applied?
The goal here is to transform your single table into two tables. My guess is that there’s something off with the group key which is causing your data to exist in one table instead of two.

Alternatively you could try:

///your initial query here
data = from(...) |> range() |>filter() |> filter |> aggregateWindow

data |> filter(fn: (r) => r["_value"] ==  "Minor")
        |> count()
        |> yield(name: "Minor")

data |> filter(fn: (r) => r["_value"] ==  "Audit")
        |> count()
        |> yield(name: "Audit")

Lastly, you could also drop the sensitive columns that you’ve blacked out, export the data as Annotated CSV, and share it with me and I can look into it on my machine.

kawada · August 12, 2021, 7:09am

Thank you for your reply, @Anaisdg!

I have not had the time to look into this again, but here is the CSV:

2021-08-12-08-56_influxdb_data
#group;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE
#datatype;string;long;dateTime:RFC3339;dateTime:RFC3339;dateTime:RFC3339;string;string;string;string;string;string;string;string;string;string;string;string;string
#default;_result;;;;;;;;;;;;;;;;;
;result;table;_start;_stop;_time;_value;_field;_measurement;dest_ip;dest_port;event_type;flow_id;host;in_iface;path;proto;src_ip;src_port
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T10:36:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;1437107460025591;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;111.111.111.111;36766
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T16:56:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;2166347159271520;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;222.222.222.222;34387
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T18:20:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;321927463147576;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;110.110.110.110;11751
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T23:48:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;333555236012237;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;100.100.100.100;13488
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-12T02:56:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;334429991280534;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;180.180.180.180;53958
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-12T02:56:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;33606186887911;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;180.180.180.180;45018
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T07:52:00Z;Minor;alert_metadata_signature_severity_0;suricata-alerts;123.123.123.123;80;alert;1554565580150037;opnsense.home.arpa;re0;/var/log/suricata/eve.json;TCP;150.150.150.150;511

The desired goal is to have a table which displays

_value	Count
Major	6
Minor	1

Anaisdg · August 16, 2021, 3:53pm

Hello @kawada,
Here ya go:

import "csv"

csvData = "#group,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,string,string,string,string

#default,_result,,,,,,,,,,,,,,,,,

,result,table,_start,_stop,_time,_value,_field,_measurement,dest_ip,dest_port,event_type,flow_id,host,in_iface,path,proto,src_ip,src_port

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T10:36:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,1437107460025591,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,111.111.111.111,36766

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T16:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,2166347159271520,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,222.222.222.222,34387

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T18:20:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,321927463147576,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,110.110.110.110,11751

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T23:48:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,333555236012237,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,100.100.100.100,13488

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-12T02:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,334429991280534,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,180.180.180.180,53958

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-12T02:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,33606186887911,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,180.180.180.180,45018

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T07:52:00Z,Minor,alert_metadata_signature_severity_0,suricata-alerts,123.123.123.123,80,alert,1554565580150037,opnsense.home.arpa,re0,/var/log/suricata/eve.json,TCP,150.150.150.150,511"

csv.from(csv: csvData)

|> group(columns: ["_value"])

|> set(key: "Count", value: "my_Count")

|> count(column: "Count")

kawada · August 16, 2021, 6:57pm

Thanks!

So I figure it is not possible to have a single column, without the filter-thingie on the left hand side?

Anaisdg · August 17, 2021, 4:20pm

Hello @kawada,
What do you mean by a filter thingy?

kawada · August 17, 2021, 9:01pm

Anaisdg:

import "csv"

csvData = "#group,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,string,string,string,string

#default,_result,,,,,,,,,,,,,,,,,

,result,table,_start,_stop,_time,_value,_field,_measurement,dest_ip,dest_port,event_type,flow_id,host,in_iface,path,proto,src_ip,src_port

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T10:36:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,1437107460025591,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,111.111.111.111,36766

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T16:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,2166347159271520,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,222.222.222.222,34387

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T18:20:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,321927463147576,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,110.110.110.110,11751

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T23:48:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,333555236012237,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,100.100.100.100,13488

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-12T02:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,334429991280534,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,180.180.180.180,53958

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-12T02:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,33606186887911,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,180.180.180.180,45018

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T07:52:00Z,Minor,alert_metadata_signature_severity_0,suricata-alerts,123.123.123.123,80,alert,1554565580150037,opnsense.home.arpa,re0,/var/log/suricata/eve.json,TCP,150.150.150.150,511"

csv.from(csv: csvData)

|> group(columns: ["_value"])|> set(key: "Count", value: "my_Count")|> count(column: "Count")

This filter thingie:

There is probably a way to do this- to have 2 lines in the table on the right hand side, where it both says “Minor” and then a value, and “Major” and value, and ommit the “Filter tables…” section

I am sorry to bother you with perhaps trivial questions. I will try to read the manual

Anaisdg · August 18, 2021, 6:33pm

sure you can apply a group() at the end to combine the two tables into one.

kawada · August 18, 2021, 8:34pm

Beautiful! Thank you very much for the excellent support. This is the final script for this query:

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "suricata-alerts")
  |> filter(fn: (r) => r["_field"] == "alert_metadata_signature_severity_0")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> group(columns: ["_value"])
  |> rename(columns: {_value: "Severity"})
  |> set(key: "Count", value: "my_Count")
  |> count(column: "Count")
  |> group()

When I renamed the column, I had to customize the table settings and reorder the columns.

Thanks again.

kawada · October 23, 2021, 4:33pm

I am currently on a phone. I will not have time to look into this before later. And I am no expert, so hopefully someone else will chime in.

A first suggestion is to go over your post and try to format it to make it more readable

avi95022 · October 25, 2021, 8:02am

get number of values that only appear once in a column in influx
I am using the http Api call from my application to influx. It seems like most of the SQl like queries doesnt work on influx. Can some on please help me on the same

I have a measuement called active_entries which contains ActionId, ActionStatus, ActionType, Actions as feilds

time	ActId	ActStatus	ActType	Act
1634825189495	50edfda	ACTIVE	X Act	X Act Generated
1634825189496	de90722	ACTIVE	X Act	X Act Generated
1634825189497	ff4c11e	ACTIVE	X Act	X Act Generated
1634825189498	5a4aed7	ACTIVE	X Act	X Act Generated
1634825189499	6eb8f50	ACTIVE	X Act	X Act Generated
1634828999815	50edfda	CLEARED	POR_ACT	X Act Cleared
1634828999818	de90722	CLEARED	POR_ACT	X Act Cleared
1634828999819	ff4c11e	CLEARED	POR_ACT	X Act Cleared
1634828999822	5a4aed7	CLEARED	POR_ACT	X Act Cleared
1634828999823	6eb8f50	CLEARED	POR_ACT	X Act Cleared
1634829059080	cbe0e75	ACTIVE	X Act	X Act Generated
1634829059081	8636a82	ACTIVE	X Act	X Act Generated
1634829059082	a954f37	ACTIVE	X Act	X Act Generated
1634829059083	b10bbef	ACTIVE	X Act	X Act Generated
1634829059084	8e9ce45	ACTIVE	X Act	X Act Generated
1634905420303	cbe0e75	CLEARED	POR_ACT	X Act Cleared
1634905420308	8636a82	CLEARED	POR_ACT	X Act Cleared

In this case for every active ActionId a cleared Action will a come after certain time with the same ActionId a cleared Action come and entries into measurement.

How can i get the list of Active list actions which are not cleared

Output :-

time	ActId	ActStatus	ActType	Act
1634829059082	a954f37	ACTIVE	X Act	X Act Generated
1634829059083	b10bbef	ACTIVE	X Act	X Act Generated
1634829059084	8e9ce45	ACTIVE	X Act	X Act Generated

How can i get the below output For the list

LastUpdatedDate	ActId	ActStatus	ActType	ActUpdated	ActDetails
1634828999815	50edfda	CLEARED	X Act	POR_ACT	X Act Cleared
1634828999818	de90722	CLEARED	X Act	POR_ACT	X Act Cleared
1634828999819	ff4c11e	CLEARED	X Act	POR_ACT	X Act Cleared
1634828999822	5a4aed7	CLEARED	X Act	POR_ACT	X Act Cleared
1634828999823	6eb8f50	CLEARED	X Act	POR_ACT	X Act Cleared
1634829059082	a954f37	ACTIVE	X Act	-	X Act Generated
1634829059083	b10bbef	ACTIVE	X Act	-	X Act Generated
1634829059084	8e9ce45	ACTIVE	X Act	-	X Act Generated
1634905420303	cbe0e75	CLEARED	X Act	POR_ACT	X Act Cleared
1634905420308	8636a82	CLEARED	X Act	POR_ACT	X Act Cleared

Topic		Replies	Views
Query a counter of distinct values Telegraf influxdb	5	1726	July 19, 2019
InfluxDB Flux - get count of distinct occurrences Fluxlang influxdb	4	8475	July 30, 2020
Distinct values counting in a join Fluxlang	20	4439	March 31, 2020
Select distinct name and count influxdb	1	1553	May 14, 2019
Counting from string values Fluxlang flux	2	822	October 21, 2022

Query to count occurrences of distinct values

Related topics