Scaling with high cardinality or multiple databases

maholi · May 10, 2022, 11:51am

Hi

I am trying to redesign schema for my data, but no matter how I try to split data I end up with high cardinality. My worst one I end up in billions and best one I end up with around 50000. Would 50000 be considered high and problematic performance wise? It is hard to say how much data points there will be, but at the moment it will be around 100 million per year with predicted growth to 250-500 million per year. Retention will be around 2 years so it should not exceed 1 billion data points, max 1.2 billion.

I am not sure if I am splitting data wrong but I only consider 2, or max 3 tags and other comes from fields which will be around 20-30 unique data fields. One of the tags will have around 500 unique values while other tag only 5. The one with 500 is needed because with that I do lookup in relational DB for connection to other data.

One scenario where I split unrelated data per database and use a new identifiers for measurements I could end up with cardinality of around 6000, around 100 of measurements and multiple databases. Would that scenario be more desirable? The data split per DB would not be queried together and is not related.

Anaisdg · May 10, 2022, 5:30pm

Hello @maholi,
It depends largely on your HW and ingest rate.
But you can follow this guide:

It’s always more desirable to reduce your cardinality if you can and it doesn’t come at a great detriment in user experience to you/increase the complexity too badly.

Topic		Replies	Views
What actually constitutes "high" cardinality? InfluxDB 2 influxdb , cardinality	2	100	September 11, 2024
Cardinality and Data Series Store influxdb	3	540	November 12, 2019
High cardinality limitations	0	387	May 4, 2021
GUI performance in high-cardinality use cases InfluxDB 2	1	351	September 17, 2021
Cardinality and system performance InfluxDB 2 influxdb	5	3030	September 22, 2021

Scaling with high cardinality or multiple databases

Related topics