@jackzampolin
My question is pretty similar to this: Schema for Plants, Devices and Signals (so I tried to do my homework and check whether anyone has already commented on that topic)
We have to set up a time series database and migrate data from pre-existing (vertical E-A-V) structure. It is all about multiple machines with different parameters being read from these machines. Imagine we have somewhat 1000-2000 tags/attributes being read in total, but one machine uses 20, another uses 100, another uses 25 - but no tag is common (there is no intersection in these tags)
Then we have to decide between:
- A structure with thousands of fields within a single measurement:
TagKey = Machine, TagValue=MachineID, FieldKey1=Tag1, FieldKey2=Tag2, …FieldKey1000=Tag1000, etc…
- A structure with a single FieldValue and extra TagKey:
TagKey1=Machine, TagValue1=MachineID, TagKey2=TagID, TagValue2=TagName (Temperature, Pressure, PartCounter, … - 2000 possible tag values), FieldKey1=TagValueInt, FieldKey2=TagValueFloat, FieldKey3=TagValueString with the values being the actual measured values
So speaking in RDBMS terms:
option1 has 2000 columns and fewer rows (1 column is indexed)
option 2 has 5 columns and MANY millions of rows (2 columns indexed)
@Ivo_Andreev Can you share some more information about what kind of data is being collected? Is there any way to migrate to a newer collector that writes more standardized schema?
Also is there any way to normalize the data coming from the machines? Are there some common groupings between the different attributes being written (i.e. can you combine all of the pressure
values)?
The second option you mention is a possible way to handle this case. Just so I understand you, the points in that case would look as follows?
legacy_data,machine=81423819190,tagID=temperature tagValueInt=1i,tagValueFloat=1,tagValueString="1" TS
legacy_data,machine=81423873104,tagID=pressure tagValueInt=1i,tagValueFloat=1,tagValueString="1" TS
legacy_data,machine=81423819190,tagID=partCounter tagValueInt=1i,tagValueFloat=1,tagValueString="1" TS
legacy_data,machine=81423873104,tagID=foo tagValueInt=1i,tagValueFloat=1,tagValueString="1" TS
Hey @jackzampolin, guess you have been busy these days. As far as no one else has commented that topic (and the other one I referred to). Is it a stupid question, or a too complex one? Do you think you could share a design guideline in this respect?
@Ivo_Andreev Sorry for not getting back sooner:
I think the schema you mention in your post and writes I show above should work great for you! I understand that making changes to clients is not an option. Only populating one field on each write will work perfectly. That will avoid type collisions.
That is your decision on measurement name vs tag value. I personally would store the customer ID as a tag to make for fewer measurements. I find this easier to reason about.
Thanks @jackzampolin
Once we manage to make the migration and have the solution operational I will write here to share some findings.
The drawback of the proposed solution is that we sometimes need to have correlated field values (i.e. temperature measured together with pressure, together with some user specific measurement. Storing in a point with multiple values would have made live somewhat easier, but still - it does not seem to be an alternative of the triple field schema.
1 Like