"Error: Failed to download shard backup: An internal error has occurred."

Hi

We try to backup a influxdb (Version 2.0.8) running via docker.
The command we run is:

influx backup --bucket default -t "token" /influxdb_default_backup/

Backup dies with the following output:

2023-05-04T12:52:44.171092Z     info    Backing up shard        {"log_id": "0hahxYl0000", "id": 197, "path": "/influxdb_manual_backup/default/20230504T125158Z.s197.tar.gz"}
2023-05-04T12:52:44.172172Z     warn    Shard removed during backup     {"log_id": "0hahxYl0000", "id": 197}
2023-05-04T12:52:44.172204Z     info    Backing up shard        {"log_id": "0hahxYl0000", "id": 205, "path": "/influxdb_manual_backup/default/20230504T125158Z.s205.tar.gz"}
2023-05-04T12:52:52.888456Z     info    Backing up shard        {"log_id": "0hahxYl0000", "id": 213, "path": "/influxdb_manual_backup/default/20230504T125158Z.s213.tar.gz"}
2023-05-04T12:53:11.054548Z     info    Backing up shard        {"log_id": "0hahxYl0000", "id": 221, "path": "/influxdb_manual_backup/default/20230504T125158Z.s221.tar.gz"}
2023-05-04T12:53:28.487491Z     info    Backing up shard        {"log_id": "0hahxYl0000", "id": 229, "path": "/influxdb_manual_backup/default/20230504T125158Z.s229.tar.gz"}
Error: Failed to download shard backup: An internal error has occurred.
See 'influx backup -h' for help

Docker logs does not provide helpful information:

influxdb_1  | ts=2023-05-04T12:53:11.056098Z lvl=info msg="Cache snapshot (end)" log_id=0hahjvWG000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=0.446ms
influxdb_1  | ts=2023-05-04T12:53:28.497755Z lvl=info msg="Cache snapshot (start)" log_id=0hahjvWG000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
influxdb_1  | ts=2023-05-04T12:53:28.498097Z lvl=info msg="Snapshot for path written" log_id=0hahjvWG000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot path=/root/.influxdbv2/engine/data/0dfadaa5214ae2fd/autogen/229 duration=0.380ms
influxdb_1  | ts=2023-05-04T12:53:28.498137Z lvl=info msg="Cache snapshot (end)" log_id=0hahjvWG000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=0.418ms

Any suggestions on how to backup this database?
Is it possible some files are corrupted? If so, can those files be repaired?

Thanks a lot!

Update. We tried copying the influxdb to a backup location and we get many “Input/output errors”:

rsync: readlink_stat("/data2/influxdb-data/engine/data/0dfadaa5214ae2fd/autogen/301") failed: Input/output error (5)
rsync: readlink_stat("/data2/influxdb-data/engine/data/0dfadaa5214ae2fd/autogen/293") failed: Input/output error (5)
rsync: readlink_stat("/data2/influxdb-data/engine/data/0dfadaa5214ae2fd/autogen/277") failed: Input/output error (5)
rsync: readlink_stat("/data2/influxdb-data/engine/data/0dfadaa5214ae2fd/autogen/253") failed: Input/output error (5)
rsync: readlink_stat("/data2/influxdb-data/engine/data/0dfadaa5214ae2fd/autogen/261") failed: Input/output error (5)
rsync: readlink_stat("/data2/influxdb-data/engine/data/0dfadaa5214ae2fd/autogen/269") failed: Input/output error (5)

Can those files be repaired?

Hello @cami,
Hmm that error is not very helpful huh.
Looking at this issue someone was able to resolve by restarting their pod

And seeing this

Hi @Anaisdg

Thank you for the reply.
It looks like the filesystem was corrupted and and destroyed some of the database files (autogen/301 etc…). Using fsck we could repair the ext4 filesystem but the damaged files were deleted in the process. After that, the database started up successfully and we were able to create a backup.

All the best