When spinning up a new instance how much disk should I allocate?
TL;DR: Floats and Integers take up 2-3 bytes per value stored. String compression is variable and uses Snappy compression.
InfluxDB uses multiple compression techniques which vary depending on the data type of the field and the precision of the timestamps. Timestamp precision matters because you can represent them down to nanosecond scale. For timestamps we use delta encoding, scaling and compression using simple8b, run-length encoding or falling back to no compression if the deltas are too large. Timestamps in which the deltas are small and regular compress best. For instance, Influx gets great compression on nanosecond timestamps if they’re only 10ns apart each. We’d achieve the same level of compression for second precision timestamps that are 10s apart.
We use the same delta encoding for floats mentioned in Facebook’s Gorilla paper, bits for booleans, delta encoding for integers, and Snappy compression for strings.
Depending on the shape of your data, the total size for storage including all tag metadata can range from 2 bytes per point on the low end to more for random data. We found that random floats with second level precision in series sampled every 10 seconds take about 3 bytes per point. For reference, Graphite’s Whisper storage uses 12 bytes per point. Real world data will probably look a bit better since there are often repeated values or small deltas.