Enterprise cluster backup and restore questions

This regards the enterprise cluster product

What would you say would be the best practice doing a full restore on a existing cluster?

Why do you need to empty the cluster before you do a restore and what do you think is the best practice making that operation as quick as possible? Drop all databases then full restore, or is there a better way? Will github issue #9040 and/or #9148 fix this with the “live restore”?

What would you say would be the best practice doing a restore to a existing cluster where one or more replication siblings are broken in such a way that the AA service don’t have shards to copy from remaining servers? I don’t want to take down/erase the entire cluster before restoring just because x% is not working.

When restoring, files are copied to all data nodes. Would it be possible to just copy files to 1 of X replication siblings and then let the AA service spread files to the data nodes? It would speed up the restore process a lot(!).

When restoring, why is Kapacitor subscriptions not restored?

Regards
Dennis

Hi! #9040 and #9148 are closed PR’s that were covered by later work. Both are code changes on the OSS product only, not enterprise.

It’s assumed that a ‘full’ restore is performed for an entire system, which is why we require the system to be empty prior to a full restore.

In your case, there are some databases that you wish to retain, and others that you wish to repair/replace. There’s a few ways to answer your question about a partial restore. Note below, that the backup directory is given, without the manifest file name.

  1. If you know exactly which shards (by shardID) that you want to repair, you can restore them one-at-a-time from a backup as influxd-ctl restore -db <DBNAME> -rp <RPNAME> -shard <SHARDID> /path/to/backup

  2. If you have a single database that appears corrupted, you can first drop the single data base "drop database " and then restore it: `influxd-ctl restore -db /path/to/backup

For your next question, the restore process automatically sends a copy of the shard to each data node where a replica is supposed to exist. Depending on your settings, this will likely be more than one, but not all data nodes. This behavior cannot be changed currently.

Finally, Kapacitor subscriptions are not restored because they are associated at the database level and cannot be guaranteed to work if, for example, a retention policy name is changed or omitted.