Schema design - Multiple field values (metrics) vs one tag + one value

rudoce · June 17, 2021, 11:59am

Hello you all.

We are designing an Influxdb database to hold the metrics collected by sensors in 50 different locations in an industrial plant. Each of the locations captures arround 50 different metrics all with the same frequency, though two distinct locations do not have to share that frequency, each one has its own. So we have records like this:

Location1 timestamp1 metric1 metric2 … metricN

We were wandering if it is worthy to change that schema to

Location1 timestamp1 tag(metric1_name) value
Location1 timestamp1 tag(metric2_name) value
…

Without doing that change we expect to ingest 150 Million records/year, and we need to store at least the last 10 years of historical data (hopefully downsampled). We are concerned with that change in the schema because doing so will multiply the numbers of records to be stored.

Which is best approximation?

Thanks in advance.
Rubén.

Anaisdg · June 21, 2021, 4:11pm

Hello @rudoce,
Welcome!
Out of curiosity what version of InfluxDB are you considering and why?
Tags are indexed and fields are not, so it is generally recommended that sensor values be stored as fields instead of tags to lower the cardinality of your data. I would expect a tag to be each location that you’re monitoring.

However, this confused me a little bit:

Each of the locations captures arround 50 different metrics all with the same frequency, though two distinct locations do not have to share that frequency, each one has its own
type or paste code here

rudoce · June 23, 2021, 9:36am

Hello Anaisdg and thanks for replying.

One of the main reasons I was pondering to normalize the metrics is that later on, when building a dashboard in Grafana, I want the user to choose which measurement she wants to plot, using a prompt built with a variable. That is easy made with the “measure” tag.

But normalizing means the ingestion of 4.000 Million of records (retention policies included), though the users would only query 2.400M on a regular basis. If we keep it desnormalized (maintaining in some cases more than 180 fields of values), then the ingestion drops to 200 Million. And we are concerned with the performance. ¿How will respond Influxdb to such a great amount of data ingested?

Anaisdg · June 23, 2021, 4:10pm

What do you mean by

normalize the metrics

?

Why would

but normalizing means the ingestion of 4.000 Million of records (retention policies included), though the users would only query 2.400M on a regular basis?

?

Are you using OSS or Cloud? How quickly are you ingesting this data?

rudoce · June 24, 2021, 8:20am

Sorry, I didn’t express myself correctly. By “normalizing” I mean pivoting columns (all the fields) by rows, so instead of having 1 row with 50 fields (each one representing the value of a metric) I have 50 rows, with a tag hoding the metric name and a field holding de value of the corresponding metric.

And ¿Why? Because later on I want the user to be able to choose which metric she wants to plot in grafana, and it is easy made just creating a variable with the distinct values of the metrics tag (if there is another way to do that it could be great to know!).

Thanks!

Anaisdg · June 25, 2021, 2:59pm

Hello @rudoce,
Have you tried using the fieldsAsCols() function?

Topic		Replies	Views
Tags or Fields when there are many duplicated "tags" and "fields: influxdb	0	617	March 29, 2019
Best practices for choosing measurement, tags and fields Store	15	1903	August 4, 2025
Data Storage Schema Store schema	4	62	August 26, 2024
Schema design: how may tags InfluxDB 2 influxdb , schema , query , flux	5	2810	February 23, 2021
Ask for help about schema design InfluxDB 1 schema	4	1137	July 25, 2021

Schema design - Multiple field values (metrics) vs one tag + one value

Related topics