Schema design: how may tags

eloparco · February 22, 2021, 4:38pm

Hello,
in my system, clients send multiple metrics (cpu, memory, etc.). Each client has some meta-information associated (e.g. account, platform, browser, network conditions, application used) and, after looking at the documentation, I was thinking to store it as tags.
Is that a good idea? How many tags am I allowed to have / what are the best practices? I’m scared that I may end up with 20/30 tags causing an excessive overhead in the storage space.

On the other side, the alternative would be to use a client id and store the association id->meta-data in a separate database, but that would complicate the queries since I would have to retrieve the ids in advance.

I looked at the schema design suggestions in the docs but I’m not very practical with time-series databases and I want to make the right choice. Thanks!

scott · February 23, 2021, 4:08pm

@eloparco The thing to consider here is series cardinality – the total number of unique tag value combinations across all data. 20-30 tags isn’t necessarily something to worry about, but 20-30 tags with 1000s of unique values each can quickly become a problem. How many unique values do your tags have?

If you’re using InfluxDB Cloud or InfluxDB OSS 2.0, storing metadata in a separate database such as Postgres or MySQL wouldn’t be a bad approach. You can use join() to join data in InfluxDB with data in one of these external DBs at query time. Here’s an example: Join data with Flux.

eloparco · February 23, 2021, 4:18pm

Thanks, I didn’t know about that possibility, it makes InfluxDB even more interesting!
This link actually contains an example that seems to be what I was looking for: Query SQL data sources with InfluxDB | InfluxDB OSS 2.0 Documentation.

eloparco · February 23, 2021, 4:22pm

Still talking about the schema, I was reading that is better to use one field per measurement instead of multiple ones. Can you give me a hint on that?

For example, if I have cpu, ram and disk usage collected at the same pace (every second), is it better to put them together in the same measurement (increasing the disk usage)?

scott · February 23, 2021, 4:31pm

…I was reading that is better to use one field per measurement instead of multiple ones.

Can you point me to where you read that , because that definitely isn’t true. Really, on disk, measurements act as another tag, one that associates related points, so the more measurements you have, to more cardinality you have.

If you want to store all of those associated metrics in a single measurement, I think that’s totally fine.

eloparco · February 23, 2021, 4:39pm

It isn’t stated explicitly, but these posts (and others) seem to suggest that.

I’m glad you clarified.

Topic		Replies	Views
Is there performance penaly for having multipe measurements instead of one measurement with multiple tags	6	3721	April 16, 2019
Data Storage Schema Store schema	4	61	August 26, 2024
Ask for help about schema design InfluxDB 1 schema	4	1137	July 25, 2021
Schema design - Multiple field values (metrics) vs one tag + one value	5	1041	June 25, 2021
Best practices for choosing measurement, tags and fields Store	11	1846	July 22, 2025

Schema design: how may tags

Related topics