Hi, I am a total noob to InfluxDB, however I have a good understanding of the concepts of sharding, indexing, and consistancy from other databases.
I am building a large system that is supposed to collect data from hundreds of millions of devices. I am trying to understand how design my system to avoid high cardinality, and allow for good query times.
Writing: My devices will report data from different sensors in real time:
<1:12:01, device1 ,temprature = 24C >
<1:12:03, device1 ,speed= 1m/s >
<1:12:05, device1 ,noise = 1DB >
<1:12:02, device2 ,temprature = 12C >
<1:12:04, device2 ,speed= 3m/s >
<1:12:04, device2 ,noise = 5DB >
My select queries will be something along the lines of:
Give me all of the measurments from “device1” for the last minute.
I am going to have 10s of millions of devices.
Each devices can have one of 10s of sensors (noise/speed/temperature).
The list of sensors needs to be expandable in the future (although the number will remain in the 10s)
So what kind of data schema should I use?
I should probably create a single database.
device_id should probably be a “tag”, right?
For sensors: I am debating myself between seprate time series, or perhaps a sensor_type tag.
(My writes tend to point me towards separate time series, but I am not sure about my reads (perhaps a sensor_type tag?).