We have developed an application that stores time series data for heating controllers.
Controllers push readout data (around 35 data points) on a scheduled manner and each controller can have a slightly different set of data points from each other.
Originally we were storing all the readouts for all controllers of a given customer inside a single measurement per customer inside a single database. That is, a single database for all of our customers, but each customer has its own measurement in which all its controllers write their data. The readout is tagged after the controller id.
We were not performing any sort of down-sampling and the default retention policy was inifinte.
As the project progresses we are about to start down-sampling the data for query performance reasons and we also want to pre-calculate some useful aggregations.
If we continued with the “current” schema, we would be creating a new measurements per customer to contain the various aggregations we want to pre-calculate and we would be creating multiple retention policies for the down-sampled data.
If this the way forward? Or do you foresee cardinality problems as more customers are added in the future?
An alternative we are thinking as well would be a database per customer, with a measurement with multiple retention policies for the incoming readouts and extra measurements for aggregations?
It all boils down to: is it better for us to have a single database with more measurements vs having multiple databases with less measurements each.
I think that seems like a good approach if all of your data is identical. However, I would recommend checking out InfluxDB v2 so that you can take advantage of the following:
Thanks for the response. I read though the references you provided (I had already done so for their 1.x counter parts), but I am afraid that the general rules do not give a definite answer to my question.
As far as my knowledge goes, we are “just” moving concepts around: from a single database to multiple databases, from multiple measurements to a single measurement (until we start downsampling that is). But the schema of the data remains the same: we have the same tagset and fieldset.
One thing changes, though: security scopes. Before we had a single user that could read from all measurements. If we move to multiple databases, we cannot have a single user that can read from all databases unless that user is an admin (I’d prefer it wasn’t, but I have no problem using an admin user if the alternative is dinamically maintaining users as databases are created).
I can see the downside with more complex permissions. Where is the upside to move to this multi-database paradigm?
Hello @danielgonnet,
I’m not sure I completely understand your question but I’ll try my best.
You can still have a single user that can read from all measurements. I would only suggest moving to multiple databases/buckets per user in the special case where you don’t need to query all the users data at once or cross compare user data.
The upside is if you need to expire the data frequently as buckets have retention policies.
In no case we were-cross querying data beyond a tenants measurement and we do not have plans to do like so with a database per tenant, so that should not be neither a show-stopper nor a driver of the decision.
We have already decided to move to a database per tenant, create databases and retention policies on demand and use a database admin in the application that writes the data and the application that reads the data.