We are running into an issue where we are hitting the column limit of 200 due to adding a variable number of fields within a single measurement. Doing a bit of more reading, we were hoping that we could get rid of redundant fields by reducing the bucket retention time but it seems like this is not straightforward because of how the data is being retained within shards - InfluxDB shards and shard groups | InfluxDB OSS v2 Documentation.
Our bucket was configured with a retention policy of 365 days, then reduced to 30 days. The fields don’t seem to expire 4 days after setting the bucket retention period to 30 days and we cannot add any new ones. Today we set the expiration period to 1 hour to clean any unused fields, but I doubt this will yield any results.
Knowing all this, and looking at this issue here [InfluxDB v2.7.1] Why the bucket retention period do not working? - #2 by scott am I right to think the actual fields are going to expire after 7 days if the bucket was created with 365 days retention (due to using default shard retention period) or is it going to take much longer than that?
Ok, that’s what I guessed from the 200 column limit. Influx Cloud Serverless uses the InfluxDB v3 storage engine (IOx) which stores data in “partitions.” The documentation you linked to in your post is for InfluxDB OSS which uses the InfluxDB v1/v2 storage engine (TSM). TSM stores data in shards.
In IOx, partitions function similarly to TSM shards, specifically the expiration behavior. When a partition is created, it’s assigned an expiration. That expiration does not get updated if the bucket’s retention period is changed. The new retention period only affects the expiration date of newly created partitions. Also note that data only get’s deleted when the compaction service runs which I think is every hour, but I can’t say for sure.
This is a tricky situation here. There’s no real easy way to delete old columns to make new for new ones. I think your best way forward is to actually write specific fields to a different measurement. So you need to define a way to segment your fields into appropriate measurements to avoid hitting the column limit.
I was wondering, in this particular case SHOW FIELD KEYS FROM "my-bucket-with-200-columns" shows 190 fields but when I use a Flux query from the API schema.measurementFieldKeys ... and count the total field keys for the past year I get 98 since I changed the bucket expiry period to 1 hour. Does this mean that the SHOW FIELD... query is searching for all field within the bucket partitions that have not yet expired or am I seeing something else?
@iganev0886 Flux’s schema.measurementFieldKeys() has default time boundaries (the last 30 days), so if you don’t specify a start parameter, it returns only field keys that have had values written in the last 30 days. InfluxQL’s SHOW FIELD KEYS statement is not time bound, so it shows all field keys that exist in the measurement. I think this is why the two values are different.
Hmm, interesting. I’m not sure why these values would be different if the time bounds are the same. If you were using v2, I’d think it would be because SHOW FIELD KEYS looks at the index and schema.measurementFieldKeys() looks at the actual data, but I’m not sure if that’s true in v3. This may be something worth submitting and issue for on the InfluxDB repo.
Thanks for the info. I’m have a similar issue. I was thinking that data that has aged out would decrease the column count, but it seems the limit is on the “index” - the drop down list of fields I can select - and these cannot be deleted:
My test data has 200 columns and I cannot write a new column.
I was also thinking that this makes sense, it would be more problematic for me to be able to change the type of a column from say a string to an int by letting the data age out and then sending a different type.
I was using flux schema vs show field keys and getting different results, which initially made me think I had some options to reduce the column count such that I didn’t have to create a new bucket or measurement, which requires some refactoring, coopying over of certain data etc.