Single database+multiple measurements vs Multiple databases

danielgonnet · October 6, 2020, 11:40am

Hello everyone,

We have developed an application that stores time series data for heating controllers.
Controllers push readout data (around 35 data points) on a scheduled manner and each controller can have a slightly different set of data points from each other.
Originally we were storing all the readouts for all controllers of a given customer inside a single measurement per customer inside a single database. That is, a single database for all of our customers, but each customer has its own measurement in which all its controllers write their data. The readout is tagged after the controller id.
We were not performing any sort of down-sampling and the default retention policy was inifinte.

As the project progresses we are about to start down-sampling the data for query performance reasons and we also want to pre-calculate some useful aggregations.

If we continued with the “current” schema, we would be creating a new measurements per customer to contain the various aggregations we want to pre-calculate and we would be creating multiple retention policies for the down-sampled data.

If this the way forward? Or do you foresee cardinality problems as more customers are added in the future?

An alternative we are thinking as well would be a database per customer, with a measurement with multiple retention policies for the incoming readouts and extra measurements for aggregations?

It all boils down to: is it better for us to have a single database with more measurements vs having multiple databases with less measurements each.

Thanks in advance

Anaisdg · October 12, 2020, 6:57pm

Hello @danielgonnet,
Very cool! Please check out this blog for a detailed answer about your question. Data Layout and Schema Design Best Practices for InfluxDB | InfluxData
and this blog on series cardinality:
Solving Runaway Series Cardinality When Using InfluxDB | InfluxData

I think that seems like a good approach if all of your data is identical. However, I would recommend checking out InfluxDB v2 so that you can take advantage of the following:

create tokens with the API upon registration for each user for secure data transfer.
InfluxDB v2.0 API documentation
in addition to automating token generation, you can automate task generation for downsampling upon user registration with the api
InfluxDB v2.0 API documentation
-take advantage of templates
InfluxDB Templates Gallery | InfluxData
take advantage of Flux for downsampling tasks, alerts, custom functions, and math across measurements (in case you need it).
TL;DR InfluxDB Tech Tips Using Tasks and Checks for Monitoring with InfluxDB | InfluxData
TL;DR InfluxDB Tech Tips How to Monitor States with InfluxDB | InfluxData
InfluxDB: How to Do Joins, Math across Measurements | InfluxData

danielgonnet · October 15, 2020, 9:59am

Hi Anais,

Thanks for the response. I read though the references you provided (I had already done so for their 1.x counter parts), but I am afraid that the general rules do not give a definite answer to my question.

As far as my knowledge goes, we are “just” moving concepts around: from a single database to multiple databases, from multiple measurements to a single measurement (until we start downsampling that is). But the schema of the data remains the same: we have the same tagset and fieldset.

One thing changes, though: security scopes. Before we had a single user that could read from all measurements. If we move to multiple databases, we cannot have a single user that can read from all databases unless that user is an admin (I’d prefer it wasn’t, but I have no problem using an admin user if the alternative is dinamically maintaining users as databases are created).

I can see the downside with more complex permissions. Where is the upside to move to this multi-database paradigm?

Anaisdg · October 16, 2020, 6:56pm

Hello @danielgonnet,
I’m not sure I completely understand your question but I’ll try my best.

You can still have a single user that can read from all measurements. I would only suggest moving to multiple databases/buckets per user in the special case where you don’t need to query all the users data at once or cross compare user data.

The upside is if you need to expire the data frequently as buckets have retention policies.

Does this help at all?

danielgonnet · October 27, 2020, 6:46am

Hi,

In no case we were-cross querying data beyond a tenants measurement and we do not have plans to do like so with a database per tenant, so that should not be neither a show-stopper nor a driver of the decision.

We have already decided to move to a database per tenant, create databases and retention policies on demand and use a database admin in the application that writes the data and the application that reads the data.

Thanks for your help

hcrp88 · September 19, 2022, 9:45am

Can this be done if the number of my users is very large? @Anaisdg

Anaisdg · September 20, 2022, 7:27pm

Hello @hcrp88,
Welcome!
How many users are you talking? What does large mean to you? Will that number be growing indefinitely?

Topic		Replies	Views
Advice howto structure my collected data	6	25	January 24, 2025
Best practices around database/retention policy/measurement design influxdb , schema , backup	6	7260	June 9, 2017
Downside of too many databases on a single node InfluxDB setup? Store influxdb , time-series	0	937	October 24, 2017
Millions of measurements vs multiple file/value pairs per measurement Store influxdb	0	576	August 31, 2018
Using Influx to monitor environmental data	6	2866	July 20, 2017

Single database+multiple measurements vs Multiple databases

Related topics