@jbbarnes I realized I made a big error in my previous answer.
The field key/name is actually recorded for each individual series (instead of each measurement) per each TSM file (instead of each shard). This means that long field names can have an appreciable impact on database space when there is high series cardinality and when a shard is not fully compacted into a single TSM file.
For example, if there was an InfluxDB database with:
- 1 million series
- one fully compacted shard
- each series has field keys named “latitude”, “longitude”, and “altitude”
Then the field names would take up (25 bytes * 1000000 series) = 25MB.
If the field names were “lat”, “lon”, and “alt”, they would require (9 bytes * 1000000 series) = 9MB.
In this case, a long-running InfluxDB instance would see 16MB savings with shorter field names for each shard. Also, any shards actively receiving a high write throughput could contain tens to hundreds of underlying TSM files. Even though those TSM files will eventually be compacted into one single TSM, the field names will be repeated for each series in each one.
In the context of an InfluxDB database with 1 million series, the space used by field names will still be a relatively small piece of the overall disk space used, but it will not be a negligible amount of space.