Restore incremental backups v1.5 OSS

Hi!

I was trying to implement an incremental backup procedure with the InfluxDB v1.5.0 OSS.
I did the backup:

[root@user dir]# influxd backup -portable -database dbname -since 2018-03-14T11:11:11Z ./new_backup/
2018/03/15 17:10:12 backing up metastore to new_backup/meta.00
2018/03/15 17:10:12 backing up db=dbname
2018/03/15 17:10:12 backing up db=dbname rp=autogen shard=164 to new_backup/dbname.autogen.00164.00 since 2018-03-14T11:11:11Z
2018/03/15 17:10:12 backing up db=dbname rp=autogen shard=165 to new_backup/dbname.autogen.00165.00 since 2018-03-14T11:11:11Z
2018/03/15 17:10:14 backup complete:
2018/03/15 17:10:14 new_backup/20180315T171012Z.meta
2018/03/15 17:10:14 new_backup/20180315T171012Z.s164.tar.gz
2018/03/15 17:10:14 new_backup/20180315T171012Z.s165.tar.gz
2018/03/15 17:10:14 new_backup/20180315T171012Z.manifest

For restore I used the command as below:

[root@host dir]# influxd restore -portable ./new_backup/
2018/03/15 18:31:02 error updating meta: DB metadata not changed. database may already exist
restore: DB metadata not changed. database may already exist

The only way I could find to successfully restore the backup was to drop the database (included in the backup) before the restore. But this way I lose all the data before the -since date defined in the backup.

Do you know how can I restore a partial backup without losing the data already in the database?
Thanks in advance

Hello,

The path you need to take is:

  1. backup db ‘dbname’ as you already have
  2. import ‘dbname’ but to a new database name:
    influxd restore -portable -db "dbname" -newdb "dbname_tmp" ./new_backup/
  3. Use a select query to side-load the data into the existing ‘dbname’:
    use dbname_tmp; SELECT * INTO dbname..:MEASUREMENT FROM /.*/ GROUP BY *

As an aside, you may want to experiment with -start/-end instead of -since. -since will take backups at the file-level, and occasionally cleanup routines will cause old data to appear new at the file level. -start and -end, new in version 1.5, can be used to filter data based on the timestamps on the actual data points. The method you use depends entirely on your use-case.

2 Likes

Hi there,
we have the same problem. I would kindly ask to consider a “true” restore as a feature for an upcoming InfluxDB version as my understanding of a backup / restore concept is not to do the restore into a different database and then have to “manually” copy the values into the real database.

Especially the performance aspect should be considered, as it should be sufficient to copy the relevant shard file(s) into the database directory and update the metadata (in our case we are using the shard-based backup option).

Our use case (not so uncommon I believe) is that data is cyclically backuped before it is deleted due to age (e.g. keep data in a specific measurement for one year, shard cycle 1 week, when 52 shards are there the oldest shard gets deleted whenever a new shard is generated; before deletion the shard is backuped). At any time it should be possible to restore any of the backups without too much hassle (in the example above, two year old data is needed for a check so that data has to be restored).

So this is not the “disaster recovery” use case for backup (where restore is not used except in emergency situations), but the “online data reduction” use case where restore is done as part of normal operation (although manually and infrequently).

Cheers,
Ewald

We were running into the same issue and wrote this quick script to restore incremental backups so we wouldn’t be completely SOL if our DB failed and had to restore from the backups.

It’s written in Node, so it might not be optimal for every case but has worked just fine for us.

I am running into the same issue. I have tried the sideload approach in InfluxDB v1.7 OSS as described here:

However, the performance is not great with this method.

I am wondering if there’s already a better way to restore incremental backups. Thank you!

I’m struggling with incremental restores.

The sideload query method suggested in the docs takes a long time to restore a 2-day incremental backup over about 500 measurements, and it is failing with this error:

> SELECT * INTO "efd"."autogen".:MEASUREMENT FROM "efd-diff"."autogen"./.*/ WHERE time > '2021-03-01T00:00:00Z'
ERR: partial write: field type conflict: input field "error" on measurement "lsst.sal.ATDome.ackcmd" is type float, already exists as type integer dropped=58

I learned from @rawkode in another post that this is due type checking, however we didn’t have schema changes in this measurement (at least recently) so I’m wondering why this query is returning that error?