Optimize query with pivot

Hi all,

I am inserting radius accounting data into InfluxDB 2.0 beta 14
The server is a VM with 16vCPU and 24GB RAM
The tags are NASIPAddress, TunnelClientEndpoint, UserName all other data are values to lower the cardinality.
The propose of the query is to find the top UserName based on OutputOctets over the last 24 hours (or more )
First I have to calculate the difference for OutputOctets (a counter that can start long time ago) group by AcctUniqueSessionId and sum it
Next I have to group by username sum it again
And lately I have to group by “” and calculate the top 20 users
As some values are used as tags in the query I am using pivot to convert them to rows
It works well but the query take around 180 seconds for 6 million rows (2 million after the first filter ) and I would like to optimize it.
Based on the query bellow and the example dataset I can do something better ?
If I will move AcctUniqueSessionId (about 400,000 unique per day) to be a tag and remove the pivot it will enhance the performance or lower it ?
Can I have a task that will calculate difference for OutputOctets periodically and store it in a new measurement ?

from(bucket: “radius”)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r[“_measurement”] == “traffic”)
|> filter(fn: (r) => r[“_field”] == “AcctUniqueSessionId” or r[“_field”] == “OutputOctets” or r[“_field”] == “AcctStatusType”)
|> pivot(
rowKey:[“_time”, “UserName”, “NASIPAddress”],
columnKey: [“_field”],
valueColumn: “_value”
)
|> filter(fn: (r) => r[“AcctStatusType”] == “Stop” or r[“AcctStatusType”] == “Interim-Update”)
|> group(columns: [“AcctUniqueSessionId” , “UserName”])
|> rename(columns: {OutputOctets: “_value”})
|> difference(nonNegative: true)
|> sum()
|> filter(fn: (r) => r[“_value”] > 1)
|> group(columns: [“UserName”])
|> sum()
|> group(columns: [“”])
|> top(n:20)
|> map(fn: (r) => ({ r with _value: int(v: r._value) / 1000000000 }))

> #group	FALSE	FALSE	TRUE	TRUE	FALSE	FALSE	TRUE	TRUE	TRUE	TRUE	TRUE
> 
> #datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	long	string	string	string	string	string
> #default	_result										
> 	result	table	_start	_stop	_time	_value	NASIPAddress	TunnelClientEndpoint	UserName	_field	_measurement
> 		0	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	41017762	10.43.32.237	10.33.23.236	user123	InputOctets	traffic
> 		1	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	80504421	10.43.32.237	10.33.23.236	user123	OutputOctets	traffic
> 		2	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	0	10.43.32.237	10.33.23.236	user123	AcctDelayTime	traffic
> 											
> #group	FALSE	FALSE	TRUE	TRUE	FALSE	FALSE	TRUE	TRUE	TRUE	TRUE	TRUE
> #datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	string	string	string	string	string	string
> #default	_result										
> 	result	table	_start	_stop	_time	_value	NASIPAddress	TunnelClientEndpoint	UserName	_field	_measurement
> 		3	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	1b5e614983d07c183a99eecb6dc3e4df	10.43.32.237	10.33.23.236	user123	AcctUniqueSessionId	traffic
> 		4	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	10.43.35.239	10.43.32.237	10.33.23.236	user123	TunnelServerEndpoint	traffic
> 		5	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	10.22.118.32	10.43.32.237	10.33.23.236	user123	FramedIPAddress	traffic
> 		6	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	Interim-Update	10.43.32.237	10.33.23.236	user123	AcctStatusType	traffic
> 											
> #group	FALSE	FALSE	TRUE	TRUE	FALSE	FALSE	TRUE	TRUE	TRUE	TRUE	TRUE
> #datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	long	string	string	string	string	string
> #default	_result										
> 	result	table	_start	_stop	_time	_value	NASIPAddress	TunnelClientEndpoint	UserName	_field	_measurement
> 		7	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	86400	10.43.32.237	10.33.23.236	user123	AcctSessionTime	traffic
> 											
> #group	FALSE	FALSE	TRUE	TRUE	FALSE	FALSE	TRUE	TRUE	TRUE	TRUE	TRUE
> #datatype	string	long	dateTime:RFC3339	dateTime:RFC3339	dateTime:RFC3339	string	string	string	string	string	string
> #default	_result										
> 	result	table	_start	_stop	_time	_value	NASIPAddress	TunnelClientEndpoint	UserName	_field	_measurement
> 		8	2020-07-12T18:25:18.040788232Z	2020-07-12T19:25:18.040788232Z	2020-07-12T19:17:03.501303808Z	DSLAM1	10.43.32.237	10.33.23.236	user123	CallingStationId	traffic

Thanks

Nitzan

Based on the query bellow and the example dataset I can do something better ?
If I will move AcctUniqueSessionId (about 400,000 unique per day) to be a tag and remove the pivot it

I think the rest of my answer is based on this. When you say 400,000 unique per day, do you mean new unique or just total as the session id exists over the course of multiple days? If this is new per day, the cardinality will be very large here and the system might begin to struggle. If this is 400,000 total such as having 400,000 user accounts or servers or something else that is unique, then this should be fine. My concern is if you are adding 400,000 new series each day.

First I have to calculate the difference for OutputOctets (a counter that can start long time ago) group by AcctUniqueSessionId and sum it

Can you give me some details on this? In general, a counter is reported from a single host or system and the system must have a unique series associated with it. The most common example for this is a server or a pod for kubernetes. Something that indicates it is unique and there is no other part of the system that reports to the same series. I’m concerned that you may be clobbering data with your current setup. If AcctUniqueSessionId is the thing that identifies a counter as unique and it is a field, then influxdb is likely clobbering some of the data. If you have more information about your schema and where the tags come from, that would help to come up with an answer.

I don’t think pivot is what you want. It seems to me like you have a counter and then need to perform some math on different groups. Pivot can end up making that harder. With a caveat of the dangers that I mentioned above, the session id should likely be a tag. You could then do the following:

from(bucket: “radius”)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: ® => r["_measurement"] == “traffic”)
|> filter(fn: ® => r["_field"] == “OutputOctets”)
|> difference(nonNegative: true)

This will convert the counter into a real value for how much it has changed during that time period. You perform this per series without grouping or pivoting because what you really want is to convert the counter into a value and the counter should be uniquely identified per series. If you mix counters from different series, you’re unlikely to get back the real values.

|> group(columns: ["UserName"])
|> sum()

This will group each of the counters by the username so they are aggregated together. You will get the output octets per user.

|> group()
|> top(n: 20)

This will get you the top 20 users of all. You do not have to do [""]. If you don’t pass in any columns it will not group by anything.

|> map(fn: ® => ({ r with _value: int(v: r._value) / 1000000000 }))

This looks good. You are converting the value into a more manageable unit. If you want this to be a decimal value instead, you can use 1000000000.0 instead of converting the value to an integer.

This will compute a sum for the entire interval. If you want to group each of these sections into windowed intervals, you can replace the sum() aggregate with aggregateWindow(every: X, fn: sum).

I hope this helps some amount. If unique session id is 400,000 new per day, we may need to know more about your schema to try and keep this performant. If it’s new per day, I wouldn’t advise making it a tag, but I also don’t know how to identify each counter. We might have to try using telegraf as a preprocessor before writing the data.