Stop writing to one tag and create a new one

I was thinking about using influx with the following workflow:

  1. Use tagA as a tag for some time
  2. tagA is not needed anymore, so I stop writing points with values for tagA
  3. I now need tagB, so I start storing points with tagB=value

Where this process could be repeated an arbitrary amount of times. Let’s suppose that each tag has cardinality K by itself. I also have some other tags that are always being used.

My question is if this would increase cardinality unexpectedly (in the order of O(K^n). This discussion seems to point to this being an issue Remove tags post-hoc - #3 by ezquat But I don’t understand why.
As tagA and tagB (and tagC and so on…) are never used at the same time, cardinality should be closer to the sum of the cardinalities (nK) rather than the product (K^n), am I right?
Is there some hidden (or not so hidden) performance issue that I’m not seeing?

If only one tag will exist at a time then the cardinality won’t change, as tags that are not provided actually do not exist for that point, and therefore do not define the series.

Just out of curiosity, what do you want to store in those tags?
strictly speaking about DB design I find it an odd choice compared to having N values in the same tag because you will need to keep changing the query itself from time to time.

About the post you mentioned, they had a set of data with too many tags, thus having more series than what was needed (as tags define series). In order to reduce it, they removed 2 tags… generating even more series (and a higher DB cardinality).
If you compare the 2 “sets” of data, before and after removing the tags I expect them to have fewer series in the new structure and a lighter in-memory index, but since both will co-exist for some time (up to the retention policy), for quite some time they will have a higher cardinality than before and the index in-memory index will be bigger.
Only once all the old data get out of the RP they will achieve the desired outcome. (or you could delete/manipulate data, as suggested by Anais)

1 Like

Thanks for the answer Giovanni! More than clear

About the why I want to use tags this way is because I use influx to provide customer-facing analytics, and I want to give to users the possibility to use custom-defined datasource to aggregate their data, but not more than one at a time so I don’t face cardinality issues. This custom-defined datasource could change an arbitrary amount of times, so if at t0 this customer-defined category is A, and they change it to be B at t1, but I store the values from both A and B in the same tag, grouping by that tag in a period that includes both t0 and t1 would be mixing values of different data sources that won’t make sense. So I prefer to create a new tag and just aggregate the data available for the time the customer chose to use that data source as the category for aggregation, that would match the time that tag is available in the stored points

makes sense to me, given the tool I think that’s the best approach to achieve this kind of flexibility and keep cardinality under control.

1 Like