Schema design for a dataset - newbie help

chaitd · November 6, 2021, 10:00pm

Hi all, I am a complete newbie and trying to figure out the best way to design a schema for a dataset that looks as follows:

Product ID: A character string with 8 characters
Product Type: L, M or H
Air temperature [K]:
Process temperature [K]:
Rotational speed [rpm]:
Torque [Nm]: torque values are normally distributed around 40 Nm with a Ïƒ = 10 Nm and no negative values.
Tool wear [min]: in minutes
Machine failure: label that indicates, whether the machine has failed in this particular datapoint for any of the following failure modes are true.
Tool wear failure (TWF): time in minutes
Heat dissipation failure (HDF): Boolean 0 or 1
Power failure (PWF): Boolean 0 or 1
Overstrain failure (OSF): Boolean 0 or 1
Random failures (RNF): Boolean 0 or 1

So would the schema look something like this in line protocol?

airtemperature, pid=343434, ptype=L, airtemp=298.1, 1577836800000000000
processtemperature, pid=343434, ptype=L, ptemp=9298.2, 1577836800000000000
.
.
.
overstrainfailure, pid=343434, ptype=L, osf=0
randomfailure,pid=343434,ptype=L,rnf=1

grant1 · November 7, 2021, 12:13am

I found this webinar from Influxdb about Schema Design for IoT to be extremely helpful. Watch it all the way through, then think about your own data. Write down your field names & tag names, then go back and watch the video again to make sure it still makes sense.

Just as an aside, your data is apparently related to machinery. Do you have several types of machines, or just one? If you have more than one, and you are monitoring each, then you could assign each an Equipment ID. Depending on your setup, you may have something like this:

Tags	Examples
EquipID	404, 6A
Product_ID	is there a finite number of product IDs?
Product_Type	L, M, H

and the possible fields:

Fields	Type
air_temp	integer or float?
process_temp	integer or float?
rotational_speed	integer or float?
torque	integer or float?
tool_wear	integer
TWF	integer
HDF	boolean
PF	boolean
OSF	boolean
RNF	boolean

chaitd · November 7, 2021, 5:10pm

Hello @grant1, thank you this is really helpful! This is actually a synthetic dataset from the UCI predictive maintenance dataset - https://archive.ics.uci.edu/ml/dataset/AI4I+2020+Predictive+Maintenance+Dataset
But it’s supposed to be one kind of machine and product id is a misnomer in my opinion. I’m using the product id as a unique key, essentially but maybe that’s not the right thing to do and I should generate an equipment id as you point out.
Thanks for the link to webinar, will watch it.
Best,
Chait

Topic		Replies	Views
InfluxDB understanding basics and IoT Schema design InfluxDB 1 influxdb , time-series , influxdata , schema , query	3	2119	November 3, 2021
Ask for help about schema design InfluxDB 1 schema	4	1136	July 25, 2021
Schema for Plants, Devices and Signals Store influxdb , iot , schema	1	1420	April 24, 2018
Schema design: peer review request Store	6	1179	October 14, 2018
Best schema design for sensor with time series in time series Store schema	0	665	June 25, 2019

Schema design for a dataset - newbie help

Related topics