High cardinality data strategies

kquinn · November 2, 2018, 1:14pm

Does anyone have strategies for best dealing with high cardinality data like this?
I see the TSI and that looks promising, but I was wondering what patterns folks use to help manage this?

I have a network of 5 - 10 million devices, messaging upwards of multiple times a second. I aggregate this millisecond data into minutes, hours and days with corresponding retention policies. I am collecting various attributes about these that I need to report upon over time. Each attribute can have dozens of values. I eventually need to support multiple of these scenarios.

So, data could look something like this:
deviceId attr1 attr2 fieldA
1 abc 89 88.1
2 abc 77 20.56
3 efg 77 45

It seems that I can only ‘group by’ tags and ‘order by’ time. My users want to query data mostly grouped by deviceId, attr1 and/or attr2

Ideally I would have the deviceId and the attr1 and attr2 be tags and the values be fields to be able to return the desired grouping, but the number of series would grow to be in the 10’s of millions. With several scenarios like this, it could multiply further.

If I make only the id a tag and the attribute fields, I don’t see a way to have the database group by or even sort by the attributes to efficiently get them back to my UI organized in groups for display.

Topic		Replies	Views
Cardinality and Data Series Store influxdb	3	488	November 12, 2019
Sensor Data - Series Cardinality Store	6	1573	November 7, 2019
Performance Tuning for High Cardinality InfluxDB 2	5	1318	January 11, 2021
Tags with high cardinality	8	5490	October 31, 2020
High cardinality & boolean values	3	2391	April 30, 2017

High cardinality data strategies

Related Topics