I am currently studying the documentation of InfluxDB 2.0; however, I don’t understand the logic between buckets, measurements & retention policies entirely yet.
The documentation says that databases and retention policies got replaced with buckets. A bucket is per definition:
“a named location where time-series data is stored in InfluxDB 2.0”
In my understanding
A bucket contains shard groups => Shard groups store data of a certain interval in a particular folder; for example.: a shard group could always save data of a four-hours-interval in a single folder.
A shard group contains shards => Shards are the single rows/points of the time-series table.
Moreover, Influx writes in the documentation that one bucket has one retention policy.
This means that “a bucket” stores only one time-series and not several ones; otherwise, a bucket could have several retention policies.
In case my understanding is correct, does this mean that you can only include measurements in the same bucket when all of them have the same retention policy? Because if there are two measurements with different retention policies in the same bucket, one retention policy could delete data from the other measurement. Please correct me if I confuse things here.
However, in case I am right, how does this influence hardware requirements?
Influx says that the number of series affects hardware requirements.
That actually means, that every bucket/retention policy raises the number of series and by that the hardware requirements?
For example, does it make a difference when storing 60,000 series in one bucket
VS
Storing 20,000 series in bucket A, another 20,000 series in bucket B, and the final 20,000 series in bucket C.
I am looking forward to your feedback!