Scaling with high cardinality or multiple databases

Hi

I am trying to redesign schema for my data, but no matter how I try to split data I end up with high cardinality. My worst one I end up in billions and best one I end up with around 50000. Would 50000 be considered high and problematic performance wise? It is hard to say how much data points there will be, but at the moment it will be around 100 million per year with predicted growth to 250-500 million per year. Retention will be around 2 years so it should not exceed 1 billion data points, max 1.2 billion.

I am not sure if I am splitting data wrong but I only consider 2, or max 3 tags and other comes from fields which will be around 20-30 unique data fields. One of the tags will have around 500 unique values while other tag only 5. The one with 500 is needed because with that I do lookup in relational DB for connection to other data.

One scenario where I split unrelated data per database and use a new identifiers for measurements I could end up with cardinality of around 6000, around 100 of measurements and multiple databases. Would that scenario be more desirable? The data split per DB would not be queried together and is not related.

Hello @maholi,
It depends largely on your HW and ingest rate.
But you can follow this guide:

It’s always more desirable to reduce your cardinality if you can and it doesn’t come at a great detriment in user experience to you/increase the complexity too badly.

1 Like