Not all measurements would have the same sampling rate, but they do need to be analysed together.
For example, board temp#3860erature could be sampled every 10 seconds, but S.M.A.R.T.attributes for hard-disks may only be read once every hour. Is it better to store both of these values together or should they be stored separately?
Keeping them together would have some values staying the same value for thousands of entries - which should compress well, but nonetheless would have a storage overhead. Keeping them separately would need to join two different series with dissimilar intervals outside the db - it might perform much worse than if the data were in one series - though, there would be less data to be read. Some kind of reconciliation would also have to be run to pinpoint the value of a low frequency measurement at the same time as a high frequency one.
@asti My suggestion would be to store the values together. The compression for repeated values is excellent. We store a pointer to the original so that there is hardly any storage impact < 2 bytes per value.
Thank you for the reply, Jack.
That low compression overhead is excellent.
Does it still apply for multiple entities within the same measurement?
That is, if I store timeseries data that mostly repeats, but they are stored together as:
Or would server1 and server2 have to stored independently?
If it’s snappy compression, then only the overall symbols should matter, and not the deltas - can I correctly assume this is the case?
@asti You should store server as a tag and have the measurement name be something descriptive of the values being collected. We use snappy for strings and ``double delta compression for floats and integers.
There’s a schema design recommendation in InfluxDb docs to avoid using an identifier as a tag - it states that a large number of unique tags degrade the index.
Should it be artificially partitioned into multiple tags, Say region1,machine1? Or are a few thousand unique values of a tag acceptable?
@asti A few thousand unique tags is fine! A single instance can handle between 5-10M series. That number is going to increase significantly when tsi (new index implementation) will increase this number significantly.