Merging old bucket data into new bucket

I have data (both /data/ and /wal/ directories) from an old v2 instance stored. I now have a v2 instance running with the same bucket as the old. I would like to merge the old data into the new.

I was trying to migrate the data storage from /var/lib to a separate folder. It turns out I moved the buckets over but did not properly re-configure Influx to save all new incoming data to these new file locations. So I now have two separate datasets where there is some overlap in data between the current /var/lib folder and the original folder—see below, subdirectory 1433. Is it possible to recombine the two different folders into a single congruent dataset without losing data from the overlapping file?

Thanks in advance.


X-Influxdb-Build: OSS
X-Influxdb-Version: v2.7.1

New instance directory:

NEW_DIR/engine/data/BUCKET_ID/autogen# ls -l
total 24
drwxr-x--- 3 influxdb influxdb 4096 Apr  2 22:50 1433
drwxr-x--- 3 influxdb influxdb 4096 Apr 15 20:16 1500
drwxr-x--- 3 influxdb influxdb 4096 Apr 15 20:16 1522
drwxr-x--- 3 influxdb influxdb 4096 May  1 18:48 1538
drwxr-x--- 3 influxdb influxdb 4096 May  1 18:48 1554
drwxr-x--- 3 influxdb influxdb 4096 May  3 19:15 1570

Old instance directory:

OLD_DIR/engine/data/BUCKET_ID/autogen# ls -l
total 168
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1003
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1016
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1029
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1060
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1073
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1086
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1099
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1112
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1125
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1138
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 1151
drwxr-x--- 3 influxdb influxdb 4096 Nov 27 04:09 1164
drwxr-x--- 3 influxdb influxdb 4096 Jan  9 18:13 1177
drwxr-x--- 3 influxdb influxdb 4096 Jan  9 18:13 1190
drwxr-x--- 3 influxdb influxdb 4096 Jan  9 18:13 1207
drwxr-x--- 3 influxdb influxdb 4096 Jan  9 18:13 1221
drwxr-x--- 3 influxdb influxdb 4096 Jan  9 18:13 1235
drwxr-x--- 3 influxdb influxdb 4096 Jan  9 18:13 1249
drwxr-x--- 3 influxdb influxdb 4096 Feb 22 20:28 1263
drwxr-x--- 3 influxdb influxdb 4096 Feb 22 20:28 1277
drwxr-x--- 3 influxdb influxdb 4096 Feb 22 20:28 1292
drwxr-x--- 3 influxdb influxdb 4096 Feb 22 20:28 1307
drwxr-x--- 3 influxdb influxdb 4096 Feb 22 20:28 1322
drwxr-x--- 3 influxdb influxdb 4096 Feb 22 20:28 1337
drwxr-x--- 3 influxdb influxdb 4096 Mar 19 00:52 1352
drwxr-x--- 3 influxdb influxdb 4096 Mar 19 00:52 1365
drwxr-x--- 3 influxdb influxdb 4096 Mar 19 00:52 1380
drwxr-x--- 3 influxdb influxdb 4096 Mar 19 00:52 1395
drwxr-x--- 3 influxdb influxdb 4096 Mar 27 16:21 1410
drwxr-x--- 3 influxdb influxdb 4096 Mar 27 16:22 1433
drwxr-xr-x 3 influxdb influxdb 4096 Aug  1  2023 705
drwxr-xr-x 3 influxdb influxdb 4096 Aug  1  2023 717
drwxr-xr-x 3 influxdb influxdb 4096 Aug  1  2023 730
drwxr-xr-x 3 influxdb influxdb 4096 Aug  1  2023 743
drwxr-xr-x 3 influxdb influxdb 4096 Aug  1  2023 756
drwxr-xr-x 3 influxdb influxdb 4096 Aug  1  2023 769
drwxr-xr-x 3 influxdb influxdb 4096 Aug  1  2023 782
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 938
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 951
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 964
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 977
drwxr-x--- 3 influxdb influxdb 4096 Nov 25 07:31 990

Hmmmmmm in theory?
Here’s a general/theoretical approach:

  1. Backup Data: Always start by backing up both the current and the old datasets. This can prevent data loss in case something goes wrong during the merge process.
  2. Review Overlapping Shards: Review the data in this shard from both directories to understand the extent of the overlap. You might need to use tools like influx inspect to extract and compare data summaries from the TSM files within these directories.
  3. Stop InfluxDB Service: Before modifying data files, stop the InfluxDB service to avoid conflicts and data corruption.
  4. Copy Non-Overlapping Shards: For shards that exist only in the old dataset and not in the new one, you can simply copy them into the corresponding new instance directory. Make sure the permissions and ownership of the directories and files match the requirements of your InfluxDB instance. I think you also have to have DBs with the same retention policies for this to work.

I’ve never tested that approach so I cant say for sure that it would work.

You might also reconsider starting a new instance and pointing to your old data in the InfluxDB config and reconfigure properly/try again.