Count uniq records

I have follow records

event,email=a@a.a,event_type=1,project=12 value=1
event,email=b@b.b,event_type=2,project=12 value=1

events can be duplicated.

I need calculate “how many uniq events per user happens for project 12”, so pair event_type, email should be distinct and grouped by event_type

so output should be something like:

event_type   |     count
1                           2
2                           2

how can I achieve this?
I see distinct function but it accept only one column

Hello @mogadanez,
Try taking a look at “Group query results by more than one tag”


I would do something like select count("value") as "count" group by "email","event_type","project"
OR you can enable flux and use distinct() which works on more than just fields unlike in influxql.
Let me know if that helps!/How you get along.
Thanks.
1 Like

distinct works better, but not sure how it works.
I made distinct by different columns and get same result

So My test Data:

event,user=A,project=1,event_type=5 value=1i 1585692000000
event,user=A,project=1,event_type=11 value=1i 1585695600000
event,user=B,project=1,event_type=5 value=1i 1585699200000
event,user=B,project=1,event_type=11 value=1i 1585702800000
event,user=A,project=2,event_type=5 value=1i 1585706400000
event,user=A,project=2,event_type=11 value=1i 1585710000000
event,user=A,project=1,event_type=5 value=1i 1585713600000
event,user=A,project=1,event_type=11 value=1i 1585717200000
event,user=B,project=1,event_type=11 value=1i 1585720800000
event,user=B,project=2,event_type=11 value=1i 1585724400000
event,user=B,project=3,event_type=11 value=1i 1585728000000
event,user=A,project=3,event_type=5 value=1i 1585731600000
event,user=A,project=3,event_type=11 value=1i 1585735200000
event,user=A,project=3,event_type=11 value=1i 1585738800000
event,user=A,project=2,event_type=11 value=1i 1585742400000
event,user=A,project=2,event_type=11 value=1i 1585746000000

If I do query will project filter:

from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["project"] == "1")
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> distinct(column: "event_type")
  |> group(columns: [ "user"])
  |> count(column:"_value")

but if I omit project filtering:

from(bucket: "testBucket")
  |> range(start: -10y)
  //|> filter(fn: (r) => r["project"] == "1")
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> distinct(column: "event_type")
  |> group(columns: [ "user"])
  |> count(column:"_value")

it counts as:

A: 6
B: 4
and if I have a more tags, it will be even more multiplied

Question: can I with data above count uniq user-event_type pairs?

@mogadanez distinct() will only return distinct values for a single column (as you noted). It won’t give you what you’re looking for in this case.

What is considered a unique event? I see multiple event_type=11 per user per project. Are each of these a unique event or should they only be counted as one?

If they should only be counted as one, I would:

  1. Group by user and project.
  2. Use unique() to return unique records in each table without modifying the group keys.
  3. Count the remaining records in each table.
from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["project"] == "1")
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> group(columns: [ "user","project"])
  |> unique(column: "event_type")
  |> count(column:"_value")
1 Like

Yes, they can be duplicated, since its raw “log” data, and users can do same things multiple times
they should be counted as one

actually seems im oversimplify real data =). As I briefly mentioned above I have some more tags.

So will try describe problem again:
source data - sorted:

event,user=A,project=1,target=1,event_type=1 value=1i 1585692000000
event,user=A,project=1,target=1,event_type=2 value=1i 1585695600000
event,user=A,project=1,target=2,event_type=1 value=1i 1585699200000
event,user=A,project=1,target=2,event_type=1 value=1i 1585702800000
event,user=A,project=1,target=2,event_type=2 value=1i 1585706400000
event,user=A,project=2,target=2,event_type=1 value=1i 1585710000000
event,user=A,project=2,target=2,event_type=2 value=1i 1585713600000
event,user=A,project=2,target=2,event_type=2 value=1i 1585717200000
event,user=A,project=2,target=3,event_type=1 value=1i 1585720800000
event,user=A,project=2,target=3,event_type=2 value=1i 1585724400000
event,user=B,project=1,target=1,event_type=1 value=1i 1585728000000
event,user=B,project=1,target=2,event_type=1 value=1i 1585731600000
event,user=B,project=1,target=2,event_type=1 value=1i 1585735200000
event,user=B,project=2,target=2,event_type=1 value=1i 1585738800000
event,user=B,project=2,target=3,event_type=1 value=1i 1585742400000
event,user=C,project=1,target=1,event_type=1 value=1i 1585746000000
event,user=C,project=1,target=2,event_type=1 value=1i 1585749600000
event,user=C,project=2,target=2,event_type=2 value=1i 1585753200000
event,user=C,project=2,target=2,event_type=2 value=1i 1585756800000
event,user=C,project=2,target=3,event_type=2 value=1i 1585760400000
event,user=D,project=1,target=1,event_type=1 value=1i 1585764000000
event,user=D,project=2,target=2,event_type=2 value=1i 1585767600000

here we have projects [1,2] project 1 have targets [1,2] project 2 have targets [2,3],
( so target 2 is overlap )

I need count for each event_type uniq user-target pairs. project in this case is not valued, but potentially can be filtered for.
so if User A have event 1 for target 2 both in projects 1 and 2, it counts as 1
In MySql I did same with COUNT ( Disctinct user, target )

tried with uniq:

from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> group(columns: [ "event_type"])
  |> unique(column: "user")
  |> count(column:"_value")

but its return not what I want:

event_type    |    count
1                    4
2                    3

expected fro data above is:

event_type    |    count
1                    9
2                    6

(according to follow table):

  1     2
 -----------
 A-1   A-1
 A-2   A-2
 A-3   A-3  
 B-1
 B-2
 B-3
 C-1       
 C-2   C-2
       C-3
 D-1
       D-2

maybe I can use .map() somehow to make “calculated” value as user-target?

seems map() did the trick

from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> map(fn: (r) => ({ r with u_t: r["user"]+"_"+r["target"]}))
  |> group(columns: [ "event_type"])
  |> unique(column: "u_t")
  |> count(column:"_value")

Not sure how it will affect performance, need some tests

1 Like

This topic was automatically closed 60 minutes after the last reply. New replies are no longer allowed.