Schema design to decrease cardinality

mjansson · February 14, 2022, 10:13am

Thank you Giovanni, absolutely helpful!

In my real-world scenario, I will in fact have more tags (six to be accurate, one “common”, T1, and five “point of views”, T2-T6), and thus a much higher theoretical cardinality.

As the actual cardinality in the end will depend on use-cases outside of my control, I have to consider the worst case scenario. The different “point of views” as you accurately described them, may very well have less than 10 values each in the end, or dependent tags (which I “loose” by doing the separation). But as I don’t know that, I consider the worst case. They won’t “run away”, there will be at max between 10-20 tag values per tag, and more (~100) for the “common” tag, but doing that math on the low end result in 100*10*10*10*10*10=10000000, high cardinality (with relatively few points per serie). On the high end, 320000000, ridiculously high…

Splitting the views in five separate measurements, “by_T2…by_T6”, getting a total cardinality between 5000-10000 instead, is definitely worth it I would say.

I hadn’t really thought about the incoherent data problem, good point! I think it’s acceptable though, some loss of points is to be expected. And with “five views” which should have the same total, the most accurate total will always be the highest number in my case (sum aggregate over some time-frame). I can always compare/max/sum the different series, should I have the need.

I think I’ll go for option 2, I agree it is the cleanest most straightforward option.

Topic		Replies	Views
Simple Schema Design, Cardinality Consideration InfluxDB 2	0	341	March 8, 2021
Tags with high cardinality	8	5635	October 31, 2020
Recommendation on schema design Store	0	744	May 10, 2018
Stop writing to one tag and create a new one InfluxQL influxdb	3	549	October 26, 2021
Cardinality and Data Series Store influxdb	3	537	November 12, 2019

Schema design to decrease cardinality

Related topics