Data Storage Schema

vinh_lee · August 21, 2024, 1:39am

Hi alls!
I have been working with influxdb for few months, and sometimes, I still find myself struggling to define which ones is tag and which ones is field. I have been reading many articles to understand it.
Let say I have a json that has ID, Room, equipment1, Temperature1, equipment 2, Temperature2, equipment 3, and Temperature3.
First approach: I will assign ID is a tag, Room is tags, and the rest are fields.
Second approach: ID is a tag, equipment1 is tag, Temperature1 is tag, equipment 2 is tag, Temperature2 is tag, equipment 3 is tag, and Temperature3 is tag, Room fields.
My question is: In term of data storage, which approach do you think it take more spaces? Or they are all the same.
Thank you all for reading. Appreciate all your support.

scott · August 21, 2024, 2:34pm

@vinh_lee A few questions for you:

What version of InfluxDB are you using?
How many unique values do you expect for each of these?
When considering a single, distinct source of data, what values stay the same over time and what values change over time?

vinh_lee · August 22, 2024, 12:05am

Hi @scott,
Thank you so much for reply.
1/I am using InlfuxDB v2 OSS, the latest version
2/ Unique value are ID, Room, equipment (name), and all temperature changes over the time.

Additionally, after posting my question here, I finally had my own answer for that which is I will assign ID, Room, and Equipment (name) as tags because their values are unique or at least the values will not be changed over time. On the other hand, temperature will be fields.
I believe that by doing that it will reduce the series cardinality which reduces the memory space. Please correct me if I’m wrong here.

And also, what are the pros and cons of having more fields? and what are the pros and cons of having more tags? Let’s say I don’t care about query performances.

scott · August 22, 2024, 5:18pm

@vinh_lee The schema you decided on is exactly what I was going to recommend.

With InfluxDB 2.x, there aren’t really any pros or cons to the number of fields and tags you use, although, if you’re future-proofing, in InfluxDB v3, it does matter. The more fields and tags you have in a measurement, some queries won’t perform well (like SELECT * ... that selects all the columns in a measurement).

What really matters is the number of unique values for a specific tag (cardinality) which you already know about. Tags are indexed and the larger your index, the more system resources (especially memory) InfluxDB requires.

vinh_lee · August 26, 2024, 11:10pm

Thank you so much @scott,

Topic		Replies	Views
InfluxDB understanding basics and IoT Schema design InfluxDB 1 influxdb , time-series , influxdata , schema , query	3	2119	November 3, 2021
Schema design: how may tags InfluxDB 2 influxdb , schema , query , flux	5	2788	February 23, 2021
Writing my first schema. Tags confuse me InfluxDB 2 influxdb , schema	6	335	November 17, 2023
Tags or Fields when there are many duplicated "tags" and "fields: influxdb	0	617	March 29, 2019
Schema design - Multiple field values (metrics) vs one tag + one value	5	1040	June 25, 2021

Data Storage Schema

Related topics