Inside of InfluxDB, you have “databases”. A database has one or more “retention policies”. A retention policy acts as a wrapper for data that should be expired after a given period of time.
This is the general flow of aggregating (downsampling) data over the data’s lifetime.
- Full-precision data is written and stored for a brief period of time. What “brief” means depends on your use case, but let’s just say one month.
- After a month, a continuous query (or Kapacitor task) downsamples that data into a lower precision. It does this by group points into windows of time and aggregating the value of each window. For example, you could window the data into 5 minute intervals and calculate the average of values within that window. That then becomes the new downsampled data point.
- The downsampled, lower precision data is then stored in another retention policy with a longer retention period.
- The high precision data in the shorter retention policy ages out and is dropped from the database, but the lower precision version of the data remains in the other retention policy.
This process of downsampling and storing lower precision data can be repeated as many times as you need to. You just need to design the flow and balance data precision with disk usage over time. The higher precision your data, the more disk it will use.