I would like to start using InfluxDB, and would appreciate some help understanding how to organize the data. It will be for a web application which will allow users to archive and retrieve building environmental data such as floats Room 123 temperature, Room 321 temperature, Room 123 humidity, AHU-1 supply temperature, etc, and Booleans Room 123 door status, AHU-1 run status, etc.
There will be multiple users, and each of their data is independent, and will never be queried together.
Each user will solely be responsible to come up with the description of their data, and Room 123 temperature for one user is completely unrelated to Room 123 temperature of another user.
The data is received real time, and every piece of data will likely have a different timestamp.
Typical queries will be like the following.
Get the average temperature of Room 123, maximum humidity of Room 123, and average temperature of Room 321 between 1 year ago and 2 years ago for User1.
Get a trend of 100 average values for Room 123 temperature, Room 123 humidity, and Room 321 temperature between 1 year ago and 2 years ago for User1.
Note that the data will be inserted and queried via a script application and not directly by a user, and thus I don’t need intuitive names and can make them “value_1”, “value_2”, “value_3”, etc where 1, 2, 3 come from a traditional SQL database.
Questions…
Should I use a separate database for each user or include a tag for user in each measurement?
Should I use a separate field for temperature, pressure, humidity, etc, or just a common field for “analog values” and a second field for “Boolean values”?
What other tags should I consider? I contemplated making one for location like “User1_Room123”, but don’t think users will typically have more than one measurement for a given location and when they do, will not want the complexity of having to associate the measurement with the location.
Assuming exact timestamps are not needed, would it be beneficial from a performance standpoint to force the inserts to coincide on some common periods like ever whole second?
Or am I looking at this all wrong? If so, how would you recommend organizing the data?
I hope I am not asking too much. If so, are there any consultants available who I can pay to help out? I don’t have a huge budget, but always believe getting the database right up front is one of the most important things I can do.
Depends on how you are restricting access. If you have a front end, then segregating user data by tags is very easy and allows for cross user comparisons on the admin side. On the other hand if you are using grafana to expose that data to users then separating it into DB/customer allows you to use InfluxDB users to gate access.
In our data model using separate fields is generally the right way to go. My advice would be to have a separate measurement for each sensor, or ‘thing’, and have separate fields under that for each piece of data it is collecting.
That format risks encoding more than one piece of data in a tag, making it difficult to query. I would advise using any tags as you need to properly differentiate the data. We use those tags to define series.
YES! This is encouraged design.
You are mostly on the right track! Hopefully I’m doing a good job at answering your questions!
In our data model using separate fields is generally the right way to go. My advice would be to have a separate measurement for each sensor, or ‘thing’, and have separate fields under that for each piece of data it is collecting.
Let’s say user1 and user2 have the following sensors:
All the temperature, humidity, and pressure sensors are floats and the doorswitch is boolean.
Each of these sensors only collecting one piece of data which is the value. User1’s temperaturesensor1 is totally unrelated to user2’s temperaturesensor1, might have a different range, units, etc, will be read at different times, and user1 might one day delete their temperaturesensor1 while user2 might not.
So, assuming a single database, then I will definitely want to include a tag for users. After that, I am still lost.
Would I use 8 measurements with one field each?
Or maybe 2 measurements “float_values” with 7 fields and “boolean_values” with 1 field?
While easy to use a single database with a users tag, since the data is unrelated, and user1 and user2 don’t even know the other exists, I don’t see the value of doing so.
Given my example data, do you really mean 8 measurements or just 5 measurements?
user1
temperaturesensor1
temperaturesensor2
humiditysensor1
doorswitch1
user2
temperaturesensor1
humiditysensor1
humiditysensor2
pressuresensor1
8 measurements would mean user1’s temperaturesensor1, user1’s temperaturesensor2, and user2’s temperaturesensor1 are three separate measurements, each individual humidity sensors is another, etc.
5 measurements would mean user1’s temperaturesensor1, user1’s temperaturesensor2, and user2’s temperaturesensor1 are one single measurement, all humidity sensors are another, etc.
What is the benefit of using multiple measurements versus just one? Is it only for human readability?
A common (pseudo) query will be something like the following:
SELECT MEAN(“user2.temperaturesensor1”), MAX(“user2.temperaturesensor1”), MAX(“user2.humiditysensor1”), MEAN(“user2.humiditysensor2”), MEAN(“user2.pressuresensor1”)
FROM …
WHERE …
If it was 8 measurements, would this mean the above query would actually need to be performed in 4 separate queries?
If it was 5 measurements, would this mean the above query would actually need to be performed in 3 separate queries?
Why wouldn’t I just want one measurement and do it in one query: anysensor { userId, sensorId } value
I don’t know why I am having such a hard time. Based on the implemented schema, looks like these are four (or more?) options.
Why would one be better than the other?
One measurement with many fields and no tag for sensorId. Can do math within functions using something like: SELECT MEAN(“value”) FROM (SELECT “hum1”/“hum2” AS “value” FROM “measurements”). One query only unless math in functions is required
SELECT MEAN(“temp1”), MAX(“temp1”), MAX(“hum1”), MEAN(“hum2”), MEAN(“pres1”)
FROM "measurements"
WHERE time >= ‘2015-08-18T00:00:00Z’ AND time <= '2015-08-18T00:30:00Z’
AND “user” = 2
One measurement with one field and uses a tag for sensorId. Cannot do math within functions. Must do query 4 times to do what the first one did
SELECT MEAN(“value”)
FROM "measurements"
WHERE time >= ‘2015-08-18T00:00:00Z’ AND time <= '2015-08-18T00:30:00Z’
AND “user” = 2 AND sensorId=“temp1”
Multiple measurements with one field and uses a tag for sensorId. Cannot do math within functions. Must do query 4 times to do what the first one did
SELECT MEAN(“value”)
FROM "temp"
WHERE time >= ‘2015-08-18T00:00:00Z’ AND time <= '2015-08-18T00:30:00Z’
AND “user” = 2 AND sensorId=1
Even more multiple measurements with one field and no tag for sensorId. Cannot do math within functions. Must do query 4 times to do what the first one did
SELECT MEAN(“value”)
FROM "temp1"
WHERE time >= ‘2015-08-18T00:00:00Z’ AND time <= '2015-08-18T00:30:00Z’
AND “user” = 2