Query to count occurrences of distinct values

Hi,

I could not find the way to build a single query for getting the count of occurrences of distinct values.

select distinct(allocationLimit) from resources
name: resources
time distinct


0 54636
0 13220
0 0
0 14098

select count(allocationLimit) from resources where allocationLimit=54636
name: resources
time count


0 2

Like in the above example, how can I get the occurrences of all distinct values of “allocationLimit” in single query with output like:

Count distinct


2 54636
1 13220
4 0
5 14098

Any idea on how to do this?

Thanks,
Arti

1 Like

Hey,

Have you tried this request ?

SELECT count(allocationLimit) FROM resources GROUP BY allocationLimit

Bugs’

Did you figure this out? GROUP BY doesn’t work on fields, and in my case I also need to GROUP BY a tag.

I’m storing data about computers. One field I have is the OS version, which I don’t want to store as a TAG since it will change often. I tag by serial number so my series will look like this:

tags { serial: x1234 }
fields { os_version: “Mac OSX 10.12.5” }

I can do this to get a list of distinct versions, I want the last known value.
select distinct(“last”) from (SELECT last(“os_version”) FROM “computers” GROUP BY serial)

name: computers
time distinct


0 Mac\ OS\ X\ 10.12.6\ (16G1114)
0 Mac\ OS\ X\ 10.12.6\ (16G1036)
0 Mac\ OS\ X\ 10.13.2\ (17C205)
0 Mac\ OS\ X\ 10.12.6\ (16G29)

But how do I count the number of each distinct version? Like OP, I want:

name: compuers
count distinct


200 Mac\ OS\ X\ 10.12.6\ (16G1114)
450 Mac\ OS\ X\ 10.12.6\ (16G1036)
100 Mac\ OS\ X\ 10.13.2\ (17C205)
600 Mac\ OS\ X\ 10.12.6\ (16G29)

No I haven’t found any solution yet as GROUP BY supports only tags and time

Hello @ashinde,

I was able to do this in flux with the following script:

   from(bucket: "test")
      |> range(start: dashboardTime)
      |> filter(fn: (r) => r._field == "value")
      |> group(columns: ["_value"])
      |> map(fn: (r) => ({_time: r._time, index: 1}))
      |> cumulativeSum(columns: ["index"]) 
      |> last()

Where my data in line protocol looks like:

test,tagkey=occ, value=1
test,tagkey=occ, value=5
test,tagkey=occ, value=5
test,tagkey=occ, value=1
test,tagkey=occ, value=1

This feels a little hacky, but hopefully, it will carry you through while I look for a more elegant solution.

If you decide to explore writing more flux scripts, the spec and the testdata repo will be your best resource.

1 Like

Hi,

Did you find a more elegant solution?
Thanks,
Stuart

Hello @StuartH,
yes you can do:

   from(bucket: "test")
      |> range(start: dashboardTime)
      |> filter(fn: (r) => r._field == "value")
      |> group(columns: ["_value"])
      |> count()

I don’t know why I didn’t think of that previously.

Hi. I am new to this. It is not easy. I find the documentation a bit scarse (count() function | Flux 0.x Documentation)

Having a table displayed like this:

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "suricata-alerts")
  |> filter(fn: (r) => r["_field"] == "alert_metadata_signature_severity_0")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> group(columns: ["_value]"])

Which gives me (some columns have been omitted):

_time _value _event_type
2021-08-04 10:17:00 GMT+2 Major alert
2021-08-04 10:14:00 GMT+2 Major alert
2021-08-04 10:10:00 GMT+2 Minor alert

How can I display the table like this:

_value Count
Major 2
Minor 1

Adding |> count() to the script above, only gives me a column named _value and the number 3.

Hello @kawada,
I agree, it’s not easy to wrap your head around at first.

If you group by “_value” then you should return two tables in your stream IF you have two different values in the “_value” column for that time range. Can you confirm that that’s true?

Can you share the a screen shot or annotated csv output for your query above please?

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "suricata-alerts")
  |> filter(fn: (r) => r["_field"] == "alert_metadata_signature_severity_0")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> group(columns: ["_value]"])

For example you can try:

import "csv"

csvData = "#group,false,false,true,true,false,false,true,true,true
#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,double,string,string,string
#default,mean,,,,,,,,
,result,table,_start,_stop,_time,_value,_field,_measurement,location
,,0,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T22:00:00Z,79.75,degrees,average_temperature,coyote_creek
,,0,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T23:00:00Z,79.8,degrees,average_temperature,coyote_creek
,,1,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T22:00:00Z,76.25,degrees,average_temperature,santa_monica
,,1,2021-07-05T21:35:05.673504606Z,2021-08-04T21:35:05.673504606Z,2021-07-05T23:00:00Z,79.2,degrees,average_temperature,santa_monica"

mydata =  csv.from(csv: csvData)
mydata
|> yield(name: "original data with two tables")

mydata
|> count() 
|> yield(name: "after count")

Where “original data with two tables” is:

And “after count” is:

Since original data had two output tables (you’ll achieve this by grouping by "_value ") after we apply count(), the count() function is applied to both tables in the table stream, giving us the two counts. Now the value for the counts is in the “_value” column by default. I recommend keeping int values in the “_value”. However you can use the rename() function to change the output to what you want exactly.

Thank you for your reply, @Anaisdg

Shown in the screenshot below is “Major” and “Audit”, but there are also events of “Minor”:

With count():

I would like to achieve to have a table with the total number of “Major” events, the total number of “Minor” events, etc.

Hello @kawada,
Do you mind sharing the first screen shot again but with the “raw data” view turned on?
Can you also share the output before the group() is applied?
The goal here is to transform your single table into two tables. My guess is that there’s something off with the group key which is causing your data to exist in one table instead of two.

Alternatively you could try:

///your initial query here
data = from(...) |> range() |>filter() |> filter |> aggregateWindow

data |> filter(fn: (r) => r["_value"] ==  "Minor")
        |> count()
        |> yield(name: "Minor")

data |> filter(fn: (r) => r["_value"] ==  "Audit")
        |> count()
        |> yield(name: "Audit")

Lastly, you could also drop the sensitive columns that you’ve blacked out, export the data as Annotated CSV, and share it with me and I can look into it on my machine.

Thank you for your reply, @Anaisdg!

I have not had the time to look into this again, but here is the CSV:

2021-08-12-08-56_influxdb_data
#group;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE
#datatype;string;long;dateTime:RFC3339;dateTime:RFC3339;dateTime:RFC3339;string;string;string;string;string;string;string;string;string;string;string;string;string
#default;_result;;;;;;;;;;;;;;;;;
;result;table;_start;_stop;_time;_value;_field;_measurement;dest_ip;dest_port;event_type;flow_id;host;in_iface;path;proto;src_ip;src_port
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T10:36:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;1437107460025591;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;111.111.111.111;36766
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T16:56:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;2166347159271520;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;222.222.222.222;34387
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T18:20:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;321927463147576;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;110.110.110.110;11751
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T23:48:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;333555236012237;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;100.100.100.100;13488
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-12T02:56:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;334429991280534;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;180.180.180.180;53958
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-12T02:56:00Z;Major;alert_metadata_signature_severity_0;suricata-alerts;192.168.0.14;80;alert;33606186887911;opnsense.home.arpa;re1_vlan742^;/var/log/suricata/eve.json;TCP;180.180.180.180;45018
;;0;2021-08-11T06:55:53.19789141Z;2021-08-12T06:55:53.19789141Z;2021-08-11T07:52:00Z;Minor;alert_metadata_signature_severity_0;suricata-alerts;123.123.123.123;80;alert;1554565580150037;opnsense.home.arpa;re0;/var/log/suricata/eve.json;TCP;150.150.150.150;511

The desired goal is to have a table which displays

_value Count
Major 6
Minor 1

Hello @kawada,
Here ya go:

import "csv"

csvData = "#group,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE,FALSE

#datatype,string,long,dateTime:RFC3339,dateTime:RFC3339,dateTime:RFC3339,string,string,string,string,string,string,string,string,string,string,string,string,string

#default,_result,,,,,,,,,,,,,,,,,

,result,table,_start,_stop,_time,_value,_field,_measurement,dest_ip,dest_port,event_type,flow_id,host,in_iface,path,proto,src_ip,src_port

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T10:36:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,1437107460025591,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,111.111.111.111,36766

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T16:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,2166347159271520,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,222.222.222.222,34387

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T18:20:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,321927463147576,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,110.110.110.110,11751

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T23:48:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,333555236012237,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,100.100.100.100,13488

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-12T02:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,334429991280534,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,180.180.180.180,53958

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-12T02:56:00Z,Major,alert_metadata_signature_severity_0,suricata-alerts,192.168.0.14,80,alert,33606186887911,opnsense.home.arpa,re1_vlan742^,/var/log/suricata/eve.json,TCP,180.180.180.180,45018

,,0,2021-08-11T06:55:53.19789141Z,2021-08-12T06:55:53.19789141Z,2021-08-11T07:52:00Z,Minor,alert_metadata_signature_severity_0,suricata-alerts,123.123.123.123,80,alert,1554565580150037,opnsense.home.arpa,re0,/var/log/suricata/eve.json,TCP,150.150.150.150,511"

csv.from(csv: csvData)

|> group(columns: ["_value"])

|> set(key: "Count", value: "my_Count")

|> count(column: "Count")

Thanks!

So I figure it is not possible to have a single column, without the filter-thingie on the left hand side?

Hello @kawada,
What do you mean by a filter thingy?

This filter thingie:

There is probably a way to do this- to have 2 lines in the table on the right hand side, where it both says “Minor” and then a value, and “Major” and value, and ommit the “Filter tables…” section

I am sorry to bother you with perhaps trivial questions. I will try to read the manual :slight_smile:

sure you can apply a group() at the end to combine the two tables into one.

Beautiful! Thank you very much for the excellent support. This is the final script for this query:

from(bucket: v.bucket)
  |> range(start: v.timeRangeStart, stop: v.timeRangeStop)
  |> filter(fn: (r) => r["_measurement"] == "suricata-alerts")
  |> filter(fn: (r) => r["_field"] == "alert_metadata_signature_severity_0")
  |> aggregateWindow(every: v.windowPeriod, fn: last, createEmpty: false)
  |> group(columns: ["_value"])
  |> rename(columns: {_value: "Severity"})
  |> set(key: "Count", value: "my_Count")
  |> count(column: "Count")
  |> group()

When I renamed the column, I had to customize the table settings and reorder the columns.

Thanks again.

1 Like

I am currently on a phone. I will not have time to look into this before later. And I am no expert, so hopefully someone else will chime in.

A first suggestion is to go over your post and try to format it to make it more readable :slight_smile:

get number of values that only appear once in a column in influx
I am using the http Api call from my application to influx. It seems like most of the SQl like queries doesnt work on influx. Can some on please help me on the same

I have a measuement called active_entries which contains ActionId, ActionStatus, ActionType, Actions as feilds

time ActId ActStatus ActType Act
1634825189495 50edfda ACTIVE X Act X Act Generated
1634825189496 de90722 ACTIVE X Act X Act Generated
1634825189497 ff4c11e ACTIVE X Act X Act Generated
1634825189498 5a4aed7 ACTIVE X Act X Act Generated
1634825189499 6eb8f50 ACTIVE X Act X Act Generated
1634828999815 50edfda CLEARED POR_ACT X Act Cleared
1634828999818 de90722 CLEARED POR_ACT X Act Cleared
1634828999819 ff4c11e CLEARED POR_ACT X Act Cleared
1634828999822 5a4aed7 CLEARED POR_ACT X Act Cleared
1634828999823 6eb8f50 CLEARED POR_ACT X Act Cleared
1634829059080 cbe0e75 ACTIVE X Act X Act Generated
1634829059081 8636a82 ACTIVE X Act X Act Generated
1634829059082 a954f37 ACTIVE X Act X Act Generated
1634829059083 b10bbef ACTIVE X Act X Act Generated
1634829059084 8e9ce45 ACTIVE X Act X Act Generated
1634905420303 cbe0e75 CLEARED POR_ACT X Act Cleared
1634905420308 8636a82 CLEARED POR_ACT X Act Cleared

In this case for every active ActionId a cleared Action will a come after certain time with the same ActionId a cleared Action come and entries into measurement.

How can i get the list of Active list actions which are not cleared

Output :-

time ActId ActStatus ActType Act
1634829059082 a954f37 ACTIVE X Act X Act Generated
1634829059083 b10bbef ACTIVE X Act X Act Generated
1634829059084 8e9ce45 ACTIVE X Act X Act Generated

How can i get the below output For the list

LastUpdatedDate ActId ActStatus ActType ActUpdated ActDetails
1634828999815 50edfda CLEARED X Act POR_ACT X Act Cleared
1634828999818 de90722 CLEARED X Act POR_ACT X Act Cleared
1634828999819 ff4c11e CLEARED X Act POR_ACT X Act Cleared
1634828999822 5a4aed7 CLEARED X Act POR_ACT X Act Cleared
1634828999823 6eb8f50 CLEARED X Act POR_ACT X Act Cleared
1634829059082 a954f37 ACTIVE X Act - X Act Generated
1634829059083 b10bbef ACTIVE X Act - X Act Generated
1634829059084 8e9ce45 ACTIVE X Act - X Act Generated
1634905420303 cbe0e75 CLEARED X Act POR_ACT X Act Cleared
1634905420308 8636a82 CLEARED X Act POR_ACT X Act Cleared