TLDR: Is schema proposal #1 or or #2 better for keeping cardinality low?
I have what I assume to be a pretty standard IoT Sensor dataset. Sensors provide values for ONE specific thing, like volume, temperature, pressure, etc. Sensors are all at a particular Site, of which we typically have 100-1000 sensors, and we have several customers with multiple (1-10) sites. I mostly query my data for a single sensor at a time, but occasionally I will need to grab the data from several sensors at one site and add them together.
I’ve read most of the articles discussing the importance of cardinality concerns, and I’m using InfluxDB Cloud 2.0. So how can I decrease my long-term cardinality?
I have a couple schema design ideas:
- Site1Readings,sensor_id=5ea347955a97da27f73823e7 value=14.515744 1615236659000000000 (unix time)
- Site1Readings,sensor_id=5ea347955a97da27f73823e9 value=14.515745 1615236659000000000
- Site2Readings,sensor_id=5ea347955a97da27f73823e8 value=14.515746 1615236659000000000
- 5ea347955a97da27f73823e7,site_name=Site1 value=14.515744 1615236659000000000
- 5ea347955a97da27f73823e9,site_name=Site1 value=14.515745 1615236659000000000
- 5ea347955a97da27f73823e8,site_name=Site2 value=14.515746 1615236659000000000
#1 seems to fit my readings pattern better (I can filter with OR on sensor_id occasionally), but I’m unsure if it will have long-term cardinality problems. #2 just means I occasionally have to use JOIN to get data from multiple sensors in one result set (are joins frowned upon?).