I have a high cardinality measurement with traffic data stored every 5 minutes:
fieldKeys: bytes,packets (both integer) and tagKeys: device, asnum.
Tag “asnum” (AS number) is the main contributor to series cardinality.
It looks ok so far to keep all series for short term retention policy.
For longer periods I’d like to create another “longterm” measurement with the same tag “device” values, top(100) _asnum_s with original asnum tag values and tag the sum over all other asnum tags on device with asnum=‘others’.
By top(100) here I mean asnums with highest bytes per time slot on device.
The goal is to reduce longterm series cardinality and speed up queries without loosing longtail data traffic completely. Another goal is to get more compact dataset for grafana piechartorheatmap panel plugins.
Can this goal be achieved with influxql query or a sequence of queries?
My guess is to use “select into” queries for this.
Later on I can wrap them into continuous queries or just run batch job periodically.
Query like
select top(bytes,“asnum”,100) as ‘topbytes’, top(packets,“asnum”,100) from “measurementName” where <some_time_range> group by time(5m), “device”, “asnum”
should do the first part.
The questions I have:
- How to do the second part - get sum(bytes) for remaining (not top100) asnums and assign tag asnum=‘others’ for this sum?
- Are there other better approaches to the problem?