Does anyone have strategies for best dealing with high cardinality data like this?
I see the TSI and that looks promising, but I was wondering what patterns folks use to help manage this?
I have a network of 5 - 10 million devices, messaging upwards of multiple times a second. I aggregate this millisecond data into minutes, hours and days with corresponding retention policies. I am collecting various attributes about these that I need to report upon over time. Each attribute can have dozens of values. I eventually need to support multiple of these scenarios.
So, data could look something like this:
deviceId attr1 attr2 fieldA
1 abc 89 88.1
2 abc 77 20.56
3 efg 77 45
It seems that I can only ‘group by’ tags and ‘order by’ time. My users want to query data mostly grouped by deviceId, attr1 and/or attr2
Ideally I would have the deviceId and the attr1 and attr2 be tags and the values be fields to be able to return the desired grouping, but the number of series would grow to be in the 10’s of millions. With several scenarios like this, it could multiply further.
If I make only the id a tag and the attribute fields, I don’t see a way to have the database group by or even sort by the attributes to efficiently get them back to my UI organized in groups for display.