Hi all,
I am inserting radius accounting data into InfluxDB 2.0 beta 14
The server is a VM with 16vCPU and 24GB RAM
The tags are NASIPAddress, TunnelClientEndpoint, UserName all other data are values to lower the cardinality.
The propose of the query is to find the top UserName based on OutputOctets over the last 24 hours (or more )
First I have to calculate the difference for OutputOctets (a counter that can start long time ago) group by AcctUniqueSessionId and sum it
Next I have to group by username sum it again
And lately I have to group by “” and calculate the top 20 users
As some values are used as tags in the query I am using pivot to convert them to rows
It works well but the query take around 180 seconds for 6 million rows (2 million after the first filter ) and I would like to optimize it.
Based on the query bellow and the example dataset I can do something better ?
If I will move AcctUniqueSessionId (about 400,000 unique per day) to be a tag and remove the pivot it will enhance the performance or lower it ?
Can I have a task that will calculate difference for OutputOctets periodically and store it in a new measurement ?
from(bucket: “radius”)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r[“_measurement”] == “traffic”)
|> filter(fn: (r) => r[“_field”] == “AcctUniqueSessionId” or r[“_field”] == “OutputOctets” or r[“_field”] == “AcctStatusType”)
|> pivot(
rowKey:[“_time”, “UserName”, “NASIPAddress”],
columnKey: [“_field”],
valueColumn: “_value”
)
|> filter(fn: (r) => r[“AcctStatusType”] == “Stop” or r[“AcctStatusType”] == “Interim-Update”)
|> group(columns: [“AcctUniqueSessionId” , “UserName”])
|> rename(columns: {OutputOctets: “_value”})
|> difference(nonNegative: true)
|> sum()
|> filter(fn: (r) => r[“_value”] > 1)
|> group(columns: [“UserName”])
|> sum()
|> group(columns: [“”])
|> top(n:20)
|> map(fn: (r) => ({ r with _value: int(v: r._value) / 1000000000 }))
> #group FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
>
> #datatype string long dateTime:RFC3339 dateTime:RFC3339 dateTime:RFC3339 long string string string string string
> #default _result
> result table _start _stop _time _value NASIPAddress TunnelClientEndpoint UserName _field _measurement
> 0 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z 41017762 10.43.32.237 10.33.23.236 user123 InputOctets traffic
> 1 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z 80504421 10.43.32.237 10.33.23.236 user123 OutputOctets traffic
> 2 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z 0 10.43.32.237 10.33.23.236 user123 AcctDelayTime traffic
>
> #group FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> #datatype string long dateTime:RFC3339 dateTime:RFC3339 dateTime:RFC3339 string string string string string string
> #default _result
> result table _start _stop _time _value NASIPAddress TunnelClientEndpoint UserName _field _measurement
> 3 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z 1b5e614983d07c183a99eecb6dc3e4df 10.43.32.237 10.33.23.236 user123 AcctUniqueSessionId traffic
> 4 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z 10.43.35.239 10.43.32.237 10.33.23.236 user123 TunnelServerEndpoint traffic
> 5 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z 10.22.118.32 10.43.32.237 10.33.23.236 user123 FramedIPAddress traffic
> 6 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z Interim-Update 10.43.32.237 10.33.23.236 user123 AcctStatusType traffic
>
> #group FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> #datatype string long dateTime:RFC3339 dateTime:RFC3339 dateTime:RFC3339 long string string string string string
> #default _result
> result table _start _stop _time _value NASIPAddress TunnelClientEndpoint UserName _field _measurement
> 7 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z 86400 10.43.32.237 10.33.23.236 user123 AcctSessionTime traffic
>
> #group FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
> #datatype string long dateTime:RFC3339 dateTime:RFC3339 dateTime:RFC3339 string string string string string string
> #default _result
> result table _start _stop _time _value NASIPAddress TunnelClientEndpoint UserName _field _measurement
> 8 2020-07-12T18:25:18.040788232Z 2020-07-12T19:25:18.040788232Z 2020-07-12T19:17:03.501303808Z DSLAM1 10.43.32.237 10.33.23.236 user123 CallingStationId traffic
Thanks
Nitzan