Measurements vs fields

matias · October 21, 2022, 1:05pm

I have a question about measurements and fields when it comes to planning the data layout of the database.

My use case is that I have a device that outputs thousands of time series and I am not quite sure whether it makes more sense to store the device as a measurement and all the time series as separate field keys or if I’d rather store each time series as a separate measurement. So basically the question is should I store it according to option A:

    measurement: device1
        field_key: TS1
        field_key: TS2
        field_key: TS3
    ...

or according to option B:

    measurement: TS1
        field_key: TS1
    measurement: TS2
        field_key: TS2
    measurement: TS3
        field_key: TS3
     ...

I have found a few resources that discuss this distinction but I am still not sure if I have understood all the implications of the different options. In this post: Query regarding cardinality. Empty field vs multiple measurements, the conclusion basically seems to be that when it comes to the series cardinality it really does not make any difference whether option A or B is used but “but if you plan on performing queries or calculations that combine different devices and sensors, it’ll be best to keep those in the same measurement”.

Over here: database schema - InfluxDB : single or multiple measurement - Stack Overflow, the same question is discussed with partly the same conclusions. In addition, the accepted answer states that if one measurement contains several fields, values has to be written to every field once one field changes. If this is the case, it certainly is a drawback in my case. When I made some testing, however, this does not seem to be the case. Could someone confirm that writing a field value to one field key under a certain measurement will not force me to write field values to all the field keys under the given measurement? Maybe this has been different before?

in the Data Layout and Schema Design Best Practices for InfluxDB blog post (Data Layout and Schema Design Best Practices for InfluxDB | InfluxData), the same thing about keeping time series that logically belong together in the same measurement is a good idea because it makes queries less resource intensive. Additionally, at the very end of the article, a few “common schema design mistakes that lead to runaway cardinality” are listed. One of the listed mistakes seems to suggest that splitting up the time series into too many measurements is a bad idea:

Unfortunately, I am not able to make sense of this example. Could someone, maybe @Anaisdg elaborate a little more on this potential misstake?

In addition to the questions that I already asked, there is one question that I have not really been able to find a good answer to: What would be the drawbacks of design option A as opposed to design option B as presented above? To explain TS1,TS2,TS3,… a little more in detail, some of them may be related but the relationships are not necessarily known when inserting them into the database, thus the criteria about keeping time series that may be combined under the same measurement is not so easy to apply in this case. In addition, that criteria only tells something about when data is good to keep under the same measurement but it does not seem to say the opposite (i.e. if the data don’t belong together, will I get some advantages by splitting it across several measurements?).

Anaisdg · October 23, 2022, 6:51pm

Hello @matias,
I recommend option A.

Essentially you just don’t want to encode a bunch of information in a measurement.
Like instead of

weather.texas.austin temp=77

You want to encode data in tags where applicable

weather city=austin,state=texas temp=777

The drawbacks to A would be minimal…I can think of the following though:
-the different fields are completely unrelated and you’re shoving too many different fields in the same measurement. Let’s say you have 1000 fields (group 1) that you frequently need to query simultaneously and another 500 fields (group 2) that you need to query simultaneously as well. Rather than filtering through 1500 fields to return data from group 1 or group 2, you should store group 1 and group 2 in different measurements.
-maybe you’re building an app on top of influxDB and you’re serving client data. Perhaps you want to keep the data more isolated by storing data in different measurements. Although there aren’t any security benefits to storing data in different measurements (only buckets).
-you don’t ever intend on performing calculations across fields or ever querying for multiple fields simultaneously. you could reduce your flux query by one filter function/one line of code if you keep schema B. But I think this is unrealistic.

I hope this helps! Thanks for your question.

Topic		Replies	Views
Schema Design for IoT Metrics Store influxdb , time-series , iot , schema	9	7510	February 12, 2024
Query regarding cardinality. Empty field vs multiple measurements Store schema , cardinality	14	1646	April 7, 2021
Schema for Plants, Devices and Signals Store influxdb , iot , schema	1	1420	April 24, 2018
Splitting data across measurements or introducing tags influxdb , influxdb-cloud-2-0	0	612	November 1, 2022
Ask for help about schema design InfluxDB 1 schema	4	1136	July 25, 2021

Measurements vs fields

Related topics