Hi alls!
I have been working with influxdb for few months, and sometimes, I still find myself struggling to define which ones is tag and which ones is field. I have been reading many articles to understand it.
Let say I have a json that has ID, Room, equipment1, Temperature1, equipment 2, Temperature2, equipment 3, and Temperature3.
First approach: I will assign ID is a tag, Room is tags, and the rest are fields.
Second approach: ID is a tag, equipment1 is tag, Temperature1 is tag, equipment 2 is tag, Temperature2 is tag, equipment 3 is tag, and Temperature3 is tag, Room fields.
My question is: In term of data storage, which approach do you think it take more spaces? Or they are all the same.
Thank you all for reading. Appreciate all your support.
@vinh_lee A few questions for you:
- What version of InfluxDB are you using?
- How many unique values do you expect for each of these?
- When considering a single, distinct source of data, what values stay the same over time and what values change over time?
Hi @scott,
Thank you so much for reply.
1/I am using InlfuxDB v2 OSS, the latest version
2/ Unique value are ID, Room, equipment (name), and all temperature changes over the time.
Additionally, after posting my question here, I finally had my own answer for that which is I will assign ID, Room, and Equipment (name) as tags because their values are unique or at least the values will not be changed over time. On the other hand, temperature will be fields.
I believe that by doing that it will reduce the series cardinality which reduces the memory space. Please correct me if I’m wrong here.
And also, what are the pros and cons of having more fields? and what are the pros and cons of having more tags? Let’s say I don’t care about query performances.
@vinh_lee The schema you decided on is exactly what I was going to recommend.
With InfluxDB 2.x, there aren’t really any pros or cons to the number of fields and tags you use, although, if you’re future-proofing, in InfluxDB v3, it does matter. The more fields and tags you have in a measurement, some queries won’t perform well (like SELECT * ...
that selects all the columns in a measurement).
What really matters is the number of unique values for a specific tag (cardinality) which you already know about. Tags are indexed and the larger your index, the more system resources (especially memory) InfluxDB requires.
Thank you so much @scott,