Understanding the role of timestamp precision

time
influxdb
#1

Hi all,

I’m trying to get a better understanding on the role of timestamp precision and how the storage engine takes advantage of this property, in order to better understand how to address certain concerns with my current project.

From what I got from the documentation, the HTTP API, unlike the CLI interface, allows you to explicitly specify a precision, for which it “recommend[s] using the least precise precision possible as this can result in significant improvements in compression.”

  1. Why the verb “can”? What differentiates whether providing the precision improves the storage utilization efficiency? My intuition suggests me that it’s just a matter of using the same precision for a given measurement, but the documentation doesn’t state this precisely that and I haven’t read the code (yet).

  2. Is precision considered only when it is passed explicitly, or is InfluxDB somehow able to infer that a given timestamp, despite being provided in nanoseconds, has actually a minute-level coarseness? The latter seems unlikely, intuitively it looks to me that it would be difficult to handle edge cases.

  3. What happens if values are written with different precision levels to the same measurement? My understanding from the docs is that ultimately all information about write-time precision is lost, but I’d like to understand if and how this affects storage efficiency and read-time performance.

Thanks in advance!

#2

Hey @stefanobaghino!

  1. If you haven’t read this doc on the compression of the timestamps, it might help with this point. The “can” is because it can be very difficult to make promises about improvements in compression given the number of variables in the process. You’re correct in that different precisions would change the compression rates.

  2. Right now, InfluxDB defaults to nanosecond precision. There’s an open GH issue regarding the default. InfluxDB doesn’t infer anything about the time currently, but that is one of the issues being discussed in that issue.

  3. When values are written to the same measurement with different time precisions, nothing good happens. InfluxDB will add zeroes to the less precise time until it matches the specified (or default) precision, which increases the likelihood that there will be some collision with other points. It doesn’t so much affect the storage efficiency, but it might skew your data by writing over duplicate points.

I hope that helps!