Restore incremental backups v1.5 OSS

backup

#1

Hi!

I was trying to implement an incremental backup procedure with the InfluxDB v1.5.0 OSS.
I did the backup:

[root@user dir]# influxd backup -portable -database dbname -since 2018-03-14T11:11:11Z ./new_backup/
2018/03/15 17:10:12 backing up metastore to new_backup/meta.00
2018/03/15 17:10:12 backing up db=dbname
2018/03/15 17:10:12 backing up db=dbname rp=autogen shard=164 to new_backup/dbname.autogen.00164.00 since 2018-03-14T11:11:11Z
2018/03/15 17:10:12 backing up db=dbname rp=autogen shard=165 to new_backup/dbname.autogen.00165.00 since 2018-03-14T11:11:11Z
2018/03/15 17:10:14 backup complete:
2018/03/15 17:10:14 new_backup/20180315T171012Z.meta
2018/03/15 17:10:14 new_backup/20180315T171012Z.s164.tar.gz
2018/03/15 17:10:14 new_backup/20180315T171012Z.s165.tar.gz
2018/03/15 17:10:14 new_backup/20180315T171012Z.manifest

For restore I used the command as below:

[root@host dir]# influxd restore -portable ./new_backup/
2018/03/15 18:31:02 error updating meta: DB metadata not changed. database may already exist
restore: DB metadata not changed. database may already exist

The only way I could find to successfully restore the backup was to drop the database (included in the backup) before the restore. But this way I lose all the data before the -since date defined in the backup.

Do you know how can I restore a partial backup without losing the data already in the database?
Thanks in advance


#2

Hello,

The path you need to take is:

  1. backup db ‘dbname’ as you already have
  2. import ‘dbname’ but to a new database name:
    influxd restore -portable -db "dbname" -newdb "dbname_tmp" ./new_backup/
  3. Use a select query to side-load the data into the existing ‘dbname’:
    use dbname_tmp; SELECT * INTO dbname..:MEASUREMENT FROM /.*/ GROUP BY *

As an aside, you may want to experiment with -start/-end instead of -since. -since will take backups at the file-level, and occasionally cleanup routines will cause old data to appear new at the file level. -start and -end, new in version 1.5, can be used to filter data based on the timestamps on the actual data points. The method you use depends entirely on your use-case.


#3

Hi there,
we have the same problem. I would kindly ask to consider a “true” restore as a feature for an upcoming InfluxDB version as my understanding of a backup / restore concept is not to do the restore into a different database and then have to “manually” copy the values into the real database.

Especially the performance aspect should be considered, as it should be sufficient to copy the relevant shard file(s) into the database directory and update the metadata (in our case we are using the shard-based backup option).

Our use case (not so uncommon I believe) is that data is cyclically backuped before it is deleted due to age (e.g. keep data in a specific measurement for one year, shard cycle 1 week, when 52 shards are there the oldest shard gets deleted whenever a new shard is generated; before deletion the shard is backuped). At any time it should be possible to restore any of the backups without too much hassle (in the example above, two year old data is needed for a check so that data has to be restored).

So this is not the “disaster recovery” use case for backup (where restore is not used except in emergency situations), but the “online data reduction” use case where restore is done as part of normal operation (although manually and infrequently).

Cheers,
Ewald


#4

We were running into the same issue and wrote this quick script to restore incremental backups so we wouldn’t be completely SOL if our DB failed and had to restore from the backups.

It’s written in Node, so it might not be optimal for every case but has worked just fine for us.