Schema design - Multiple field values (metrics) vs one tag + one value

Hello you all.

We are designing an Influxdb database to hold the metrics collected by sensors in 50 different locations in an industrial plant. Each of the locations captures arround 50 different metrics all with the same frequency, though two distinct locations do not have to share that frequency, each one has its own. So we have records like this:

Location1 timestamp1 metric1 metric2 … metricN

We were wandering if it is worthy to change that schema to

Location1 timestamp1 tag(metric1_name) value
Location1 timestamp1 tag(metric2_name) value

Without doing that change we expect to ingest 150 Million records/year, and we need to store at least the last 10 years of historical data (hopefully downsampled). We are concerned with that change in the schema because doing so will multiply the numbers of records to be stored.

Which is best approximation?

Thanks in advance.
Rubén.

Hello @rudoce,
Welcome!
Out of curiosity what version of InfluxDB are you considering and why?
Tags are indexed and fields are not, so it is generally recommended that sensor values be stored as fields instead of tags to lower the cardinality of your data. I would expect a tag to be each location that you’re monitoring.

However, this confused me a little bit:

Each of the locations captures arround 50 different metrics all with the same frequency, though two distinct locations do not have to share that frequency, each one has its own
type or paste code here

Hello Anaisdg and thanks for replying.

One of the main reasons I was pondering to normalize the metrics is that later on, when building a dashboard in Grafana, I want the user to choose which measurement she wants to plot, using a prompt built with a variable. That is easy made with the “measure” tag.

But normalizing means the ingestion of 4.000 Million of records (retention policies included), though the users would only query 2.400M on a regular basis. If we keep it desnormalized (maintaining in some cases more than 180 fields of values), then the ingestion drops to 200 Million. And we are concerned with the performance. ¿How will respond Influxdb to such a great amount of data ingested?

What do you mean by

normalize the metrics

?

Why would

but normalizing means the ingestion of 4.000 Million of records (retention policies included), though the users would only query 2.400M on a regular basis?

?

Are you using OSS or Cloud? How quickly are you ingesting this data?

Sorry, I didn’t express myself correctly. By “normalizing” I mean pivoting columns (all the fields) by rows, so instead of having 1 row with 50 fields (each one representing the value of a metric) I have 50 rows, with a tag hoding the metric name and a field holding de value of the corresponding metric.

And ¿Why? Because later on I want the user to be able to choose which metric she wants to plot in grafana, and it is easy made just creating a variable with the distinct values of the metrics tag (if there is another way to do that it could be great to know!).

Thanks!

Hello @rudoce,
Have you tried using the fieldsAsCols() function?
https://docs.influxdata.com/influxdb/v2.0/reference/flux/stdlib/influxdb-schema/fieldsascols/