Count uniq records

I have follow records

event,email=a@a.a,event_type=1,project=12 value=1
event,email=b@b.b,event_type=2,project=12 value=1

events can be duplicated.

I need calculate “how many uniq events per user happens for project 12”, so pair event_type, email should be distinct and grouped by event_type

so output should be something like:

event_type   |     count
1                           2
2                           2

how can I achieve this?
I see distinct function but it accept only one column

Hello @mogadanez,
Try taking a look at “Group query results by more than one tag”

I would do something like select count("value") as "count" group by "email","event_type","project"
OR you can enable flux and use distinct() which works on more than just fields unlike in influxql.
Let me know if that helps!/How you get along.
Thanks.

distinct works better, but not sure how it works.
I made distinct by different columns and get same result

So My test Data:

event,user=A,project=1,event_type=5 value=1i 1585692000000
event,user=A,project=1,event_type=11 value=1i 1585695600000
event,user=B,project=1,event_type=5 value=1i 1585699200000
event,user=B,project=1,event_type=11 value=1i 1585702800000
event,user=A,project=2,event_type=5 value=1i 1585706400000
event,user=A,project=2,event_type=11 value=1i 1585710000000
event,user=A,project=1,event_type=5 value=1i 1585713600000
event,user=A,project=1,event_type=11 value=1i 1585717200000
event,user=B,project=1,event_type=11 value=1i 1585720800000
event,user=B,project=2,event_type=11 value=1i 1585724400000
event,user=B,project=3,event_type=11 value=1i 1585728000000
event,user=A,project=3,event_type=5 value=1i 1585731600000
event,user=A,project=3,event_type=11 value=1i 1585735200000
event,user=A,project=3,event_type=11 value=1i 1585738800000
event,user=A,project=2,event_type=11 value=1i 1585742400000
event,user=A,project=2,event_type=11 value=1i 1585746000000

If I do query will project filter:

from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["project"] == "1")
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> distinct(column: "event_type")
  |> group(columns: [ "user"])
  |> count(column:"_value")

but if I omit project filtering:

from(bucket: "testBucket")
  |> range(start: -10y)
  //|> filter(fn: (r) => r["project"] == "1")
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> distinct(column: "event_type")
  |> group(columns: [ "user"])
  |> count(column:"_value")

it counts as:

A: 6
B: 4
and if I have a more tags, it will be even more multiplied

Question: can I with data above count uniq user-event_type pairs?

@mogadanez distinct() will only return distinct values for a single column (as you noted). It won’t give you what you’re looking for in this case.

What is considered a unique event? I see multiple event_type=11 per user per project. Are each of these a unique event or should they only be counted as one?

If they should only be counted as one, I would:

  1. Group by user and project.
  2. Use unique() to return unique records in each table without modifying the group keys.
  3. Count the remaining records in each table.
from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["project"] == "1")
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> group(columns: [ "user","project"])
  |> unique(column: "event_type")
  |> count(column:"_value")

Yes, they can be duplicated, since its raw “log” data, and users can do same things multiple times
they should be counted as one

actually seems im oversimplify real data =). As I briefly mentioned above I have some more tags.

So will try describe problem again:
source data - sorted:

event,user=A,project=1,target=1,event_type=1 value=1i 1585692000000
event,user=A,project=1,target=1,event_type=2 value=1i 1585695600000
event,user=A,project=1,target=2,event_type=1 value=1i 1585699200000
event,user=A,project=1,target=2,event_type=1 value=1i 1585702800000
event,user=A,project=1,target=2,event_type=2 value=1i 1585706400000
event,user=A,project=2,target=2,event_type=1 value=1i 1585710000000
event,user=A,project=2,target=2,event_type=2 value=1i 1585713600000
event,user=A,project=2,target=2,event_type=2 value=1i 1585717200000
event,user=A,project=2,target=3,event_type=1 value=1i 1585720800000
event,user=A,project=2,target=3,event_type=2 value=1i 1585724400000
event,user=B,project=1,target=1,event_type=1 value=1i 1585728000000
event,user=B,project=1,target=2,event_type=1 value=1i 1585731600000
event,user=B,project=1,target=2,event_type=1 value=1i 1585735200000
event,user=B,project=2,target=2,event_type=1 value=1i 1585738800000
event,user=B,project=2,target=3,event_type=1 value=1i 1585742400000
event,user=C,project=1,target=1,event_type=1 value=1i 1585746000000
event,user=C,project=1,target=2,event_type=1 value=1i 1585749600000
event,user=C,project=2,target=2,event_type=2 value=1i 1585753200000
event,user=C,project=2,target=2,event_type=2 value=1i 1585756800000
event,user=C,project=2,target=3,event_type=2 value=1i 1585760400000
event,user=D,project=1,target=1,event_type=1 value=1i 1585764000000
event,user=D,project=2,target=2,event_type=2 value=1i 1585767600000

here we have projects [1,2] project 1 have targets [1,2] project 2 have targets [2,3],
( so target 2 is overlap )

I need count for each event_type uniq user-target pairs. project in this case is not valued, but potentially can be filtered for.
so if User A have event 1 for target 2 both in projects 1 and 2, it counts as 1
In MySql I did same with COUNT ( Disctinct user, target )

tried with uniq:

from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> group(columns: [ "event_type"])
  |> unique(column: "user")
  |> count(column:"_value")

but its return not what I want:

event_type    |    count
1                    4
2                    3

expected fro data above is:

event_type    |    count
1                    9
2                    6

(according to follow table):

  1     2
 -----------
 A-1   A-1
 A-2   A-2
 A-3   A-3  
 B-1
 B-2
 B-3
 C-1       
 C-2   C-2
       C-3
 D-1
       D-2

maybe I can use .map() somehow to make “calculated” value as user-target?

seems map() did the trick

from(bucket: "testBucket")
  |> range(start: -10y)
  |> filter(fn: (r) => r["_measurement"] == "event")
  |> filter(fn: (r) => r["_field"] == "value")
  |> map(fn: (r) => ({ r with u_t: r["user"]+"_"+r["target"]}))
  |> group(columns: [ "event_type"])
  |> unique(column: "u_t")
  |> count(column:"_value")

Not sure how it will affect performance, need some tests