Consistent influxdb backup with LVM/FS snapshot?

backup
influxdb

#1

We are looking for a fastest way to backup a big influx database.

Our first approach is backup data using the portable backup ( as described in New Features in InfluxDB Open Source Backup and Restore ).

But this approach requires:

  • twice the storage size needed in the local FS.
  • need for do the dump prior to copy to the final backup location ( tape, or a FS backup with any of the existing solutions), sometimes could be a long time

We would like to do something like mysql in this script (https://gist.github.com/keymon/1614287) , tell the database maintain files in a consistent state while system does a LVM/FS snapshot which can be immediately copied to the final location

With this last approach we won’t need twice the size of the database in the local FS and the final copy will begin much faster than the other solucion( there is no need to do a previous dump).

The is any way to do this? Could a direct snapshot be consistent?


#2

Hi @toni-moreno,

You might find my reply to this post useful:

Snapshots alone aren’t really an effective backup strategy, though, as they don’t protect you from a whole class of failures. Snapshots are more like version control, they help you to roll back your data if something goes wrong, but they won’t help you recover it if, say, your data center catches on fire.

In order to have a proper backup, the two requirements you wish to avoid are essential.

You will need to use twice (at least) the storage space for your database in order to store a complete copy of your data. This way, if you lose one copy, you have another.

You’ll also want to spend the time copying the backup to another location. If both your database and backup are on the same disk, for example, then a drive failure would wipe out all your data. Keeping them in separate locations ensures that a catastrophic event in one location doesn’t result in total data loss.

An old rule of thumb about backups is to keep three copies, on at least two types of media, with one of them being in another location. While this is less applicable in the days of cloud storage, the concepts are still strong: keep multiple copies so you can lose one (or more), different types of media will endure different failure states, and an off-site copy will save you from physical disasters like a flood.

With S3 and similar types of cloud storage, the data is replicated within Amazon’s system so they can provide a reasonable guarantee (but not 100%) that it won’t be lost. You can also use Amazon Glacier to further reduce the cost of infrequently accessed backups.

Taking a snapshots so that you can roll back from failures, and also backuping up the filesystem to tape, would provide a reasonable solution, but now you’re storing a lot of irrelevant data from the host, such as shared libraries and application binaries, which you don’t need, since you should be able to get back to that state using configuration management tools.