Remove tags post-hoc

ezquat · November 10, 2020, 5:37pm

We got in a situation where we had a high series cardinality, and we realized that two particular tags were unnecessary, so we changed our ingest process to not add those tags.

But ironically, this greatly increased series count because the new data actually forms a new set of series, without the tags. Thus, in an attempt to reduce our cardinality we inadvertently sort of doubled it

Is there any simple solution to this problem? The only thing I’m aware of is that we could export all the old data, manipulate it externally to remove the tags, and reimport it, which sounds quite time-consuming and perhaps tricky to get right while keeping all the data queryable. Does anyone have other suggestions?

Anaisdg · November 11, 2020, 10:24pm

Hello @ezquat,
Let me start by asking a basic question. Are you able to expire the old data at all? Once you’ve expired it all then you will see a decrease in cardinality.
Or if you’re using 2.x you can use a task. You could create a second bucket, start sending data there, and use a Flux script to write data form the old bucket to the new one, dropping those tags as it goes. Then you can expire data from the old bucket as you write it to the new one.
Although depending on how much data you have, your suggestion might be the best.

ezquat · November 12, 2020, 7:57pm

Anais, That definitely gives me some food for thought. I don’t know anything about 2.x, but would the same basic idea work in 1.x? I could imagine writing a query which does a “select … into” to transform the data into another retention policy. Then I could discard the original RP. But, I’m not sure what the memory implications of that are. If I duplicate the data into another RP, does that effectively increase the total series cardinality of the whole database once again? Is this something that would be different between 1.x and 2.x?

Regarding expiring old data, that’s an interesting question. We want to keep all of our data in some form, as it can occasionally be useful, but it is rare that we actually look at the older stuff. We can’t continue to grow our one InfluxDB instance forever. In the past we have twice used this strategy: Simply copy the influx data files to another place, and set a new, shorter, retention policy for the live database. That was an emergency reaction both times. It would be nice if the database gave us some feature to sort of roll data out to a cold-storage location. So far we have had some luck “rolling” data out by simply snapshotting/copying the files underneath Influx, but we’re not sure if that’s a valid, supportable option. Any suggestions for this scenario?

Topic		Replies	Views
Stop writing to one tag and create a new one InfluxQL influxdb	3	550	October 26, 2021
Interested in using Influxdb & flux for IOT Store influxdb , iot	1	777	September 4, 2018
Empty tags - affecting cardinality? influxdb , schema	2	774	June 14, 2022
Noobie questions Welcome & Getting Started influxdb , cardinality	2	607	November 30, 2022
Timestamps accidentally inserted as tags influxdb , schema , performance	2	375	July 13, 2023

Remove tags post-hoc

Related topics