Schema Design for IoT Metrics

resteeee · August 24, 2017, 8:08am

Hi guys,
We are planning to use InfluxDB as a storage for IoT data (sensors and different measurements).
We have that situation:

3000 Devices
Each “typical” device has 8 sensors, (4 temperature sensors, 4 Power consumption sensors). The number of sensors per device can vary and in future maybe there will be other different types of devices with different sensors (not only temperature or power consumption).
Each sensor sends 4 messages per minute

The typical queries will be per sensor, but i don’t exclude that we’ll need some global statistics. For that i’m planning to use a single measurement, instead multiple measurements (divided by device? Or by measure type).

I’ve some questions:

Is good to use a single measurement to store that data? Do you see any implication? There is a difference in using single or multiple measurement in terms of memory and cpu usage?
What is the ideal point structure in this case? We have to choose between these two structures:

TIMESTAMP | VALUE (FIELD) | SENSOR_ID (TAG) | DEVICE_ID (TAG) -> 1 row for each sensor
TIMESTAMP | SENSOR_A_VALUE (FIELD) | SENSOR_B_VALUE (FIELD) | SENSOR_C_VALUE (FIELD) | … | DEVICE_ID(TAG) -> 1 row containing the data of all the sensors of the device at time T

Keep in mind that the number of sensors, for some devices, can vary so, for the solution number 2, some rows will have less fields, some rows will require to add more fields (due to more than 8 sensors present). For the solution number 1 the value field will be a float, what we can do if we need to store another type? We must add a new field?

Thank you in advance,
Matteo

davidgs · August 24, 2017, 1:05pm

Hi Mateo,

I’ve recently done a similar thing, though not with as many sensors, and I asked much the same question. My data was also multi-sensor data, etc. Here’s the suggestion I got:

Typically you’ll have the sensor id/name and other metadata as a tag then the fields will be the various sensors depending on how much high level grouping you would like to have, the sensors could also be grouped into their own measurements so your line protocol will probably end up looking something like: thingamabob_sensor,id=1234,city=sf temp=22.2,humidity=0.80 1502747933000000000

So in my case, I have influx data_sensors as my ‘thingamabob_sensor’ and the two tags id and city, with the sensor readings following. Since my sensors all grab a reading for each value every second, I send the whole thing as one line-protocol write to the InfluxDB instance:

String::format("influxdata_sensors,id=%s,location=%s temp_c=%f,temp_f=%f,humidity=%f,pressure=%f,altitude=%f,broadband=%d,infrared=%d,lux=%f", myID.c_str(), myLoc.c_str(), temperature, fTemp, humidity, pressure, altitude, broadband, infrared, lux);

Making a single write to the database cuts down on the amount of time spent in setting up and tearing down the http connection within my sensor app – time better spent taking environmental readings and doing other important tasks.

I hope this helps!

resteeee · August 24, 2017, 1:18pm

Thank you david
Ok i understood your solution (and i like it :)), also the “writing optimization” part makes me think that we can go in that direction (in alternative the solution to avoid HTTP Connection setup was to use batch inserts).
But what if some (not all) of your sensors will send more than 8 measures (for example because you have built and released on market a new device version and that version sends 9 measures)?
For example, instead having that:

String::format(“influxdata_sensors,id=%s,location=%s temp_c=%f,temp_f=%f,humidity=%f,pressure=%f,altitude=%f,broadband=%d,infrared=%d,lux=%f”, myID.c_str(), myLoc.c_str(), temperature, fTemp, humidity, pressure, altitude, broadband, infrared, lux);

you have 1 more measure related to Power Consumption:

String::format(“influxdata_sensors,id=%s,location=%s temp_c=%f,temp_f=%f,humidity=%f,pressure=%f,altitude=%f,broadband=%d,infrared=%d,lux=%f,powerConsumption=%f”, myID.c_str(), myLoc.c_str(), temperature, fTemp, humidity, pressure, altitude, broadband, infrared, lux, powerConsumption);

Will that scheme support that kind of variation? Keep in mind that we do not have control on what and how devices send data. We’re just a “collector” of that data and we’ll need to plot on dashboards the fields

thank you!

resteeee · August 24, 2017, 1:31pm

I’m thinking about also at the continuous queries.

Suppose i’ve that measurement:

TIMESTAMP | SENSOR_1_VALUE (FIELD) | SENSOR_2_VALUE (FIELD) | DEVICE_ID (TAG)

Suppose that on that measurement i compute the daily mean of sensors with a continuous query and i store it inside another measurement.
At some time i have newer devices with one more sensor.

TIMESTAMP | SENSOR_1_VALUE (FIELD) | SENSOR_2_VALUE (FIELD) | SENSOR_3_VALUE (FIELD) | DEVICE_ID (TAG)

What i have to do with the continuous query? I’ve to fix it? Or maybe i can create a dynamic continuous query that uses regular expressions and consider every field in the measurement?

resteeee · August 24, 2017, 2:56pm

As an addition to the “continuous queries problem” see that Feature Request:

github.com/influxdata/influxdb

Wildcard support for fields in continuous queries.

opened 03:40PM - 19 Feb 16 UTC

closed 11:13PM - 28 Jul 16 UTC

steverweber

area/continuous queries support kind/feature-request

when the only value was "value", there was a simple way to create a dynamic CQ w…hen backreferencing `:measurement` however the new format telegraf uses seems to break this. https://docs.influxdata.com/influxdb/v0.10/query_language/continuous_queries/#cqs-with-backreferencing I'm wondering if there is a better way to do this so sub measurements are included. ``` CREATE DATABASE telegraf WITH DURATION 100d CREATE RETENTION POLICY "7y" ON telegraf DURATION 2555d REPLICATION 1 CREATE CONTINUOUS QUERY "30m_cpu" ON telegraf BEGIN SELECT mean(usage_idle) as usage_idle INTO telegraf."7y".:measurement FROM telegraf."default".cpu GROUP BY time(30m), host END CREATE CONTINUOUS QUERY "30m_mem" ON telegraf BEGIN SELECT mean(used_percent) as used_percent INTO telegraf."7y".:measurement FROM telegraf."default".mem GROUP BY time(30m), host END CREATE CONTINUOUS QUERY "30m_disk" ON telegraf BEGIN SELECT mean(used_percent) as used_percent INTO telegraf."7y".:measurement FROM telegraf."default".disk GROUP BY time(30m), host END ```

davidgs · August 24, 2017, 4:19pm

Adding a new sensor reading at a later date shouldn’t be a problem. Adding a new field should (and in my experience does) add the new field to the schema on-the-fly and things just work.

resteeee · August 24, 2017, 4:44pm

Hi David,
yes you’re right, it works. Also the possibility to use mean(*) in continuous query works with sparse data! That’s cool!
What we need to verify with the customer is if the single values coming from sensors have the same timestamp or not (so it is feasible to group them in a single row).
After that, i think we’ll try the solution:

Single measurement
Multiple fields per row

And then we’ll see with customer if we can split in more measurements based on devices or some other stuff (we want to know if they have to perform some statistical queries across devices / plants or not ;))
Thank you!

davidgs · August 24, 2017, 9:54pm

Glad to help, and let us know how it goes!

zoraida · February 12, 2024, 4:53pm

Hi! Would be possible for you to share with us your experience after those years with this schema design? Thank you!
I am asking you because we have thousand of measurements with single sensor each one and we are experimenting problems when querying then into a single ql select.

Anaisdg · February 12, 2024, 5:07pm

Hello @zoraida
Welcome! What version of InfluxDB are you using? Schema requirements have changed a lot in recent versions of InfluxDB and this thread isn’t applicable to them.

Topic		Replies	Views
Input on schema for large IOT sensor network Store influxdb , schema	0	700	November 3, 2019
Schema for Plants, Devices and Signals Store influxdb , iot , schema	1	1420	April 24, 2018
Multiple measurements with no tags or one single measurement? Store influxdb , iot , schema	1	1695	January 26, 2018
InfluxDB understanding basics and IoT Schema design InfluxDB 1 influxdb , time-series , influxdata , schema , query	3	2120	November 3, 2021
Ask for help about schema design InfluxDB 1 schema	4	1137	July 25, 2021

Schema Design for IoT Metrics

Related topics