How to get the most frequent record in influxdb

influxql
kapacitor
influxdb
#1

hi all,

In my case, there are many users and every one will upload their actions to influxdb.
and the influxdb data model like

event province city sex uid

Login provinceA cityA male userA
Buy provinceA cityA male userA
PageView provinceA cityA male userA
Login provinceA cityB female userB
Logout provinceB cityC male userC

the event, province, city, sex is tag key ,and uid is field key.
The uid will be uniq, so I put it in the field key to avoid series candidate issue.

Now I want to get the users who are frequent to upload their action data. and how to realize it in influxdb or kapacitor

in this case ,The userA is the most frequent users

thanks

#2

The count function will give you the list of the users (frequently visited users) and you can use the where clause to find the users in certain time period. The max function if clubbed with count will give you the top user.

#3

hi,I have try to use count to get the users (sql: select count(“uid”) from measurement and time > now() - 1h group by time(1h) ), but failed.
With this example ,I want to get the result like

user count
userA 3
userB 1
userC 1

but the count function just get the result
time count
1502242645775000000 5

It can not get the users list

Could you show me the sql with this example ? Or the grammar of my sql is wrong ?

thanks

#4

hi

The document of influxdb in https://docs.influxdata.com/influxdb/v1.3/query_language/functions/#count

has explained that count " Returns the number of non-null field values."

So I doubt whether the count could get the list of the users
Or is there some other way to get it?

thanks

#5

I found a similar topic in Multiple count and where in query

but it just has three value 0,1,3, so that can use tag with “group by” operation.
but in my case,the value will be millions,so it can not use tags .

thanks

#6

Try a nested query with count like below:

select top(count,,1) from (select count(“uid”) from measurement where time > now() - 1h group by time(1h))

Just replace the with a tag which is present in all the results.

#7

hi
I am sorry that it can not get the right results.

The count function can not get user list,it can just count all record.

If I have 4 data belong to the same hour, like below

time event uid
12:01 PageView userA
12:02 PageView userB
12:03 PageView userA
12:04 PageView userC

the count function will get

time count
12:00 4

But i want to get the result like

time event uid
12:00 userA 2
12:00 userB 1
12:00 userC 1

And the key is that uid is a field.

thanks

#8

Can you paste the result using the below query:

select top(count,event,1) from (select count(“uid”) from measurement where time > now() - 1h group by time(1h))

#9

hi
It will panic when execute the sql ,error info below

[I] 2017-08-11T01:24:40Z SELECT top(count, source, 10) FROM (SELECT count(uid) FROM spider.autogen.app WHERE time > now() - 1h GROUP BY time(1h)) service=query
panic: runtime error: index out of range

goroutine 717 [running]:
panic(0xa05e00, 0xc4200120f0)
/usr/local/go/src/runtime/panic.go:500 +0x1a1
github.com/influxdata/influxdb/influxql.filterIntegerByUniqueTags(0xc4213b0240, 0x2, 0x2, 0xc4222e5be8, 0x1, 0x1, 0xbaf1b8, 0x4139ce, 0x7fb3672d6690, 0x1100000038)
/root/go/src/github.com/influxdata/influxdb/influxql/call_iterator.go:1050 +0x2b6
github.com/influxdata/influxdb/influxql.NewIntegerTopReduceSliceFunc.func1(0xc4213b0240, 0x2, 0x2, 0xc421d03ba8, 0xc421d03c98, 0xc421d03c98)
/root/go/src/github.com/influxdata/influxdb/influxql/call_iterator.go:860 +0x45b
github.com/influxdata/influxdb/influxql.(*IntegerSliceFuncReducer).Emit(0xc420acbba0, 0xc421d03ba8, 0xc4207ca810, 0x3)
/root/go/src/github.com/influxdata/influxdb/influxql/functions.gen.go:595 +0x44
github.com/influxdata/influxdb/influxql.(*integerReduceIntegerIterator).reduce(0xc420b5adc0, 0x18, 0xc422862790, 0x7716b7, 0xc4228e70c0, 0xc42214f201)
/root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:4131 +0x807
github.com/influxdata/influxdb/influxql.(*integerReduceIntegerIterator).Next(0xc420b5adc0, 0xc422862780, 0x0, 0xc42214f2c0)
/root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:4044 +0xa1
github.com/influxdata/influxdb/influxql.(*integerInterruptIterator).Next(0xc420d66d80, 0xffffffffffffffff, 0x0, 0xc42214f260)
/root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:3432 +0x52
github.com/influxdata/influxdb/influxql.(*bufIntegerIterator).Next(0xc420d66da0, 0x67ebf1297c25d8, 0x0, 0x0)
/root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:2750 +0x3c
github.com/influxdata/influxdb/influxql.(*integerAuxIterator).stream(0xc420d66de0)
/root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:3529 +0x32
created by github.com/influxdata/influxdb/influxql.(*integerAuxIterator).Start
/root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:3515 +0x3f

And I try execute sql select count(“uid”) from app where time > now() - 1h group by time(1h)
the result like below
name: app
time count


1502409600000000000 7
1502413200000000000 27

there has no any tag in result ,how can the top function work for tag?
thanks

#10

Hi

The info of panic is from version 1.2.I have upgrade to 1.3.2. Now the sql work normally.But the result may be wrong.

From the start,I insert 5 points into measurement called t, every point has tag p and s, and field uid, like

select * from t
name: t
time                p    s   uid
----                -    -   ---
1502419072967540554 pv   yzl usera
1502419075188749394 pv   yzl userb
1502419078856887159 pv   yzl usera
1502419080484044059 pv   yzl userc
1502419190701236366 open yzl userc

Second,I use the sql you give me

select top(count, p, 1) from (select count("uid") from t where time > now() - 1h group by time(1h))
name: t
time                top p
----                --- -
1502416800000000000 4  

or

select top(count, s, 1) from (select count("uid") from t where time > now() - 1h group by time(1h))
name: t
time                top s
----                --- -
1502416800000000000 5  

But I just want the users list who visit frequently ,like

time                 uid     count
----                 -----   -----
1502416800000000000  usera   2
1502416800000000000  userb   1
1502416800000000000  userc   2

thanks

#11

I don’t this it can be achieved with this schema. You need to move this userid field as a “tag” value. Since the grouping option is only available via that