Is there a performance/storage impact from writing data non-chronological?
I have an influxdb that has been collecting data for some time now. I want to migrate it to a new server. Is there a significant difference in the following two scenarios:
Redirect incoming data streams to new server, then simultaneously start migrating data from old server to new server. This means that the data is coming in non-chronological
Leave incoming streams on old server, start migrating data to new server and when the new one is caught up, redirect the incoming data. This means everything is written chronologically.
I have a preference for the first scenario, but I don’t know what this means for how this data would end up on disk (chronologically in shards our spread out over multiple shards mixed with new data) and how that would impact performance.
It is supposed to not matter after the database has done full compaction, which depending on your config, happens some hours after the last write. Writing out-of-order will make the compaction process itself take much longer (in my experience). I have experienced some quite weird performance issues with influxdb lately though, so I would not bet my first-born on it actually being irrelevant even though it is claimed to be.
InfluxDB can handle temporary, non-chronological writes during migrations. When migrating to new servers, we recommend taking these steps:
Backup the old server influxd backup <path-to-backup> makes a backup of the metastore influxd backup -database <mydatabase> <path-to-backup> backs up a specific database
If you’re backing up a remote node, use these instructions.
Restore the backup to the new server
To restore from a backup you will need to specify the type of backup, the path to where the backup should be restored, and the path to the backup.
Migrate data for the interim period from when the initial backup was created to when the dual writes began
The tricky part in this is that you have to backup a time slice. Importing requires you to sideload into a temp DB and then use a SELECT…INTO query to finalize the series.