Best practices for replacing data

What are the best practices for replacing measurement data in a bucket via a nightly job? Some considerations:

  • Don’t care if there is a little bit of downtime
  • The reason I’m deleting data is because each night, we might discover that some previous measurements were invalid, and our latest dataset is more accurate

Some options:

  • Delete the bucket => recreate bucket => upload data
  • Use delete API to just delete all the measurement data via predicate statement => upload new data
  • Assuming bucket name is ‘db’ => create a new bucket each night with name “new_db” => upload data to “new_db” => delete “db” bucket => rename “new_db” to “db”
  • Each night, create a new db with name “db_YYYY_MM_DD” with a 1d retention policy. Whenever querying for the data, just query the most recent bucket name
  • Do the same things as ^, but for the measurement in the bucket