Influxdb 2.0 upgrade does not save data

I am trying to upgrade a number of databases to influxdb 2.0.6. I am trying to use the automatic upgrade via docker. I was able to upgrade successful on a instance will a small amount of data. When trying a larger system, the container runs for a long time then the docker container exist and all the ported data seems to be deleted. While its running I can see the data folder get larger and larger but then empties at the end. How can I see what the issue is? Nothing useful gets output before the docker container exists.

some sample logs during upgrade:
{“level”:“info”,“ts”:1620427896.4258375,“caller”:“upgrade/database.go:222”,“msg”:“Computed disk space”,“free”:“185 GB”,“required”:“130 GB”}
{“level”:“info”,“ts”:1620427896.4259303,“caller”:“upgrade/database.go:51”,“msg”:“Upgrading databases”}
{“level”:“warn”,“ts”:1620429548.3894737,“caller”:“upgrade/database.go:157”,“msg”:“Empty retention policy”}
{“level”:“warn”,“ts”:1620429941.7302792,“caller”:“upgrade/database.go:157”,“msg”:“Empty retention policy”}

Last logs before exit.

2021-05-07T23:54:08.675557Z info Reindexing TSM data {“log_id”: “0TzJpzol000”, “service”: “storage-engine”, “engine”: “tsm1”, “db_shard_id”: 10730}
2021-05-07T23:54:08.782569Z info Opened file {“log_id”: “0TzJpzol000”, “service”: “storage-engine”, “engine”: “tsm1”, “service”: “filestore”, “path”: “/var/lib/influxdb2/engine/data/ea9cdcba4a85f27b/autogen/3637/000000011-000000002.tsm”, “id”: 0, “duration”: “548.986ms”}
2021-05-07T23:54:08.783183Z info Reindexing TSM data {“log_id”: “0TzJpzol000”, “service”: “storage-engine”, “engine”: “tsm1”, “db_shard_id”: 3637}

folder before exit:

117G influxdata2

folder after exit:

17M influxdata2

what version of InfluxDB are you starting with? and I’m going to assume that since you’ve got a ton of data that it’s been around for awhile?

1.7.8. Should I try a 1.x upgrade first?

We didn’t explicitly test a pre-1.8.x upgrade. So, yeah, I would recommend that first. I’m not going to claim that this will definitively fix it. But it is worth eliminating that as an issue first.

@Hank_Beasley a few more questions to help debugging:

  • Are you stopping the container manually, or is it stopping automatically for some reason?
  • After shutdown, what’s the 17M that’s left behind in the influxdata2 volume?

The entry-point script in the container has logic to auto-clean data files if automated setup/upgrade hits an error or is interrupted (to help make the container idempotent), but I’d expect to see a log that says "cleaning bolt and engine files to prevent conflicts on retry" if that logic was triggering.

@Hank_Beasley you’re probably hitting this issue: Upgrade error rm: cannot remove '/var/lib/influxdb2/engine/wal/4d18dcaf0212e7bd/autogen/456': Directory not empty · Issue #471 · influxdata/influxdata-docker · GitHub. I’m working on a fix today.

Thanks for following up. I was able to get this upgrade to work once by upgrading to 1.8.5 first, running the influxdb command line consistency checks, and increasing server memory. I am not sure what actually resolved the issue. The upgrade was temporary in a test environment. I will need to attempt the same upgrade again this week for the production environment.