Schema design for a dataset - newbie help

Hi all, I am a complete newbie and trying to figure out the best way to design a schema for a dataset that looks as follows:

Product ID: A character string with 8 characters
Product Type: L, M or H
Air temperature [K]:
Process temperature [K]:
Rotational speed [rpm]:
Torque [Nm]: torque values are normally distributed around 40 Nm with a σ = 10 Nm and no negative values.
Tool wear [min]: in minutes
Machine failure: label that indicates, whether the machine has failed in this particular datapoint for any of the following failure modes are true.
Tool wear failure (TWF): time in minutes
Heat dissipation failure (HDF): Boolean 0 or 1
Power failure (PWF): Boolean 0 or 1
Overstrain failure (OSF): Boolean 0 or 1
Random failures (RNF): Boolean 0 or 1

So would the schema look something like this in line protocol?

airtemperature, pid=343434, ptype=L, airtemp=298.1, 1577836800000000000
processtemperature, pid=343434, ptype=L, ptemp=9298.2, 1577836800000000000
.
.
.
overstrainfailure, pid=343434, ptype=L, osf=0
randomfailure,pid=343434,ptype=L,rnf=1

I found this webinar from Influxdb about Schema Design for IoT to be extremely helpful. Watch it all the way through, then think about your own data. Write down your field names & tag names, then go back and watch the video again to make sure it still makes sense.

Just as an aside, your data is apparently related to machinery. Do you have several types of machines, or just one? If you have more than one, and you are monitoring each, then you could assign each an Equipment ID. Depending on your setup, you may have something like this:

Tags Examples
EquipID 404, 6A
Product_ID is there a finite number of product IDs?
Product_Type L, M, H

and the possible fields:

Fields Type
air_temp integer or float?
process_temp integer or float?
rotational_speed integer or float?
torque integer or float?
tool_wear integer
TWF integer
HDF boolean
PF boolean
OSF boolean
RNF boolean
1 Like

Hello @grant1, thank you this is really helpful! This is actually a synthetic dataset from the UCI predictive maintenance dataset - https://archive.ics.uci.edu/ml/dataset/AI4I+2020+Predictive+Maintenance+Dataset
But it’s supposed to be one kind of machine and product id is a misnomer in my opinion. I’m using the product id as a unique key, essentially but maybe that’s not the right thing to do and I should generate an equipment id as you point out.
Thanks for the link to webinar, will watch it.
Best,
Chait