Cardinality and Data Series

c-eshalev · November 10, 2019, 2:13pm

Hi, I am a total noob to InfluxDB, however I have a good understanding of the concepts of sharding, indexing, and consistancy from other databases.

I am building a large system that is supposed to collect data from hundreds of millions of devices. I am trying to understand how design my system to avoid high cardinality, and allow for good query times.

Writing: My devices will report data from different sensors in real time:
<1:12:01, device1 ,temprature = 24C >
<1:12:03, device1 ,speed= 1m/s >
<1:12:05, device1 ,noise = 1DB >
<1:12:02, device2 ,temprature = 12C >
<1:12:04, device2 ,speed= 3m/s >
<1:12:04, device2 ,noise = 5DB >

My select queries will be something along the lines of:
Give me all of the measurments from “device1” for the last minute.

I am going to have 10s of millions of devices.
Each devices can have one of 10s of sensors (noise/speed/temperature).
The list of sensors needs to be expandable in the future (although the number will remain in the 10s)

So what kind of data schema should I use?
I should probably create a single database.
device_id should probably be a “tag”, right?
For sensors: I am debating myself between seprate time series, or perhaps a sensor_type tag.
(My writes tend to point me towards separate time series, but I am not sure about my reads (perhaps a sensor_type tag?).

katy · November 11, 2019, 5:24pm

I agree that devide_id should be a tag. You’re going to end up with relatively high cardinality because of the ids, but that’s just the nature of the dataset. Make sure you have TSI enabled on InfluxDB (it helps with performance in high cardinality). I would only add a sensor_type tag if you need it for the kinds of analysis/queries you’ll be doing. The more separate series you have, the higher your cardinality will be. The best path is usually to decide what attributes you will want to group by and make those tags.

c-eshalev · November 12, 2019, 4:49pm

Thanks @katy,
So I should ignore the Series part, and just have a general “measurement” series with a bunch of tags (as little as possible with respect to my queries). Right?

katy · November 12, 2019, 9:23pm

Yeah, I think that’s a great start

Topic		Replies	Views
Optimize InfluxDB for 2.8 billion series and more InfluxDB 2 influxdb	6	936	April 27, 2021
Sensor Data - Series Cardinality Store	6	1633	November 7, 2019
Recommendation on schema design Store	0	747	May 10, 2018
Query regarding cardinality. Empty field vs multiple measurements Store schema , cardinality	14	1649	April 7, 2021
InfluxDB users, what is your series cardinality?	6	7658	June 4, 2018

Cardinality and Data Series

Related topics