When are measurement fields deleted for 'real'?

iganev0886 · September 17, 2024, 4:38pm

Hi,

We are running into an issue where we are hitting the column limit of 200 due to adding a variable number of fields within a single measurement. Doing a bit of more reading, we were hoping that we could get rid of redundant fields by reducing the bucket retention time but it seems like this is not straightforward because of how the data is being retained within shards - InfluxDB shards and shard groups | InfluxDB OSS v2 Documentation.

Our bucket was configured with a retention policy of 365 days, then reduced to 30 days. The fields don’t seem to expire 4 days after setting the bucket retention period to 30 days and we cannot add any new ones. Today we set the expiration period to 1 hour to clean any unused fields, but I doubt this will yield any results.

Knowing all this, and looking at this issue here [InfluxDB v2.7.1] Why the bucket retention period do not working? - #2 by scott am I right to think the actual fields are going to expire after 7 days if the bucket was created with 365 days retention (due to using default shard retention period) or is it going to take much longer than that?

scott · September 17, 2024, 4:40pm

@iganev0886 What version of InfluxDB are you using?

iganev0886 · September 17, 2024, 4:48pm

@scott We are using Influx Cloud (Influx 3)

scott · September 17, 2024, 6:06pm

Ok, that’s what I guessed from the 200 column limit. Influx Cloud Serverless uses the InfluxDB v3 storage engine (IOx) which stores data in “partitions.” The documentation you linked to in your post is for InfluxDB OSS which uses the InfluxDB v1/v2 storage engine (TSM). TSM stores data in shards.

In IOx, partitions function similarly to TSM shards, specifically the expiration behavior. When a partition is created, it’s assigned an expiration. That expiration does not get updated if the bucket’s retention period is changed. The new retention period only affects the expiration date of newly created partitions. Also note that data only get’s deleted when the compaction service runs which I think is every hour, but I can’t say for sure.

This is a tricky situation here. There’s no real easy way to delete old columns to make new for new ones. I think your best way forward is to actually write specific fields to a different measurement. So you need to define a way to segment your fields into appropriate measurements to avoid hitting the column limit.

iganev0886 · September 18, 2024, 7:52am

thanks for the detailed info @scott !

I was wondering, in this particular case
SHOW FIELD KEYS FROM "my-bucket-with-200-columns" shows 190 fields but when I use a Flux query from the API schema.measurementFieldKeys ... and count the total field keys for the past year I get 98 since I changed the bucket expiry period to 1 hour. Does this mean that the SHOW FIELD... query is searching for all field within the bucket partitions that have not yet expired or am I seeing something else?

scott · September 18, 2024, 2:23pm

@iganev0886 Flux’s schema.measurementFieldKeys() has default time boundaries (the last 30 days), so if you don’t specify a start parameter, it returns only field keys that have had values written in the last 30 days. InfluxQL’s SHOW FIELD KEYS statement is not time bound, so it shows all field keys that exist in the measurement. I think this is why the two values are different.

iganev0886 · September 18, 2024, 2:28pm

Thanks for the reply @scott . Forgot to mention that I am running the query to start back and look 10 years ago

schema.measurementFieldKeys(
  bucket: "my-bucket",
  measurement: "my-measurement",
  start: -10y
)

it still yields different results and the measurement has not been created that far back

scott · September 18, 2024, 2:35pm

Hmm, interesting. I’m not sure why these values would be different if the time bounds are the same. If you were using v2, I’d think it would be because SHOW FIELD KEYS looks at the index and schema.measurementFieldKeys() looks at the actual data, but I’m not sure if that’s true in v3. This may be something worth submitting and issue for on the InfluxDB repo.

thopewell · September 19, 2024, 3:10pm

Hi @scott,

Thanks for the info. I’m have a similar issue. I was thinking that data that has aged out would decrease the column count, but it seems the limit is on the “index” - the drop down list of fields I can select - and these cannot be deleted:

My test data has 200 columns and I cannot write a new column.
I was also thinking that this makes sense, it would be more problematic for me to be able to change the type of a column from say a string to an int by letting the data age out and then sending a different type.

I was using flux schema vs show field keys and getting different results, which initially made me think I had some options to reduce the column count such that I didn’t have to create a new bucket or measurement, which requires some refactoring, coopying over of certain data etc.

Thanks,
Tom

Topic		Replies	Views
Delete old data from measurements InfluxDB 2 influxql , flux	3	1888	February 9, 2022
Question about retentions Store	8	3583	February 27, 2019
Restore Data after Field Change Store influxdb , backup	0	485	November 13, 2019
Retention policy on specific measurement InfluxDB 2	1	772	December 28, 2022
Indefinite Retention Store influxdb	0	726	February 4, 2019

When are measurement fields deleted for 'real'?

Related topics