Writing data non-chronological - impact on performance / storage?

#1

Is there a performance/storage impact from writing data non-chronological?

I have an influxdb that has been collecting data for some time now. I want to migrate it to a new server. Is there a significant difference in the following two scenarios:

  1. Redirect incoming data streams to new server, then simultaneously start migrating data from old server to new server. This means that the data is coming in non-chronological
  2. Leave incoming streams on old server, start migrating data to new server and when the new one is caught up, redirect the incoming data. This means everything is written chronologically.

I have a preference for the first scenario, but I don’t know what this means for how this data would end up on disk (chronologically in shards our spread out over multiple shards mixed with new data) and how that would impact performance.

Thanks!

#2

It is supposed to not matter after the database has done full compaction, which depending on your config, happens some hours after the last write. Writing out-of-order will make the compaction process itself take much longer (in my experience). I have experienced some quite weird performance issues with influxdb lately though, so I would not bet my first-born on it actually being irrelevant even though it is claimed to be.

1 Like
#3

Hi @coussej!

InfluxDB can handle temporary, non-chronological writes during migrations. When migrating to new servers, we recommend taking these steps:

  1. Backup the old server
    influxd backup <path-to-backup> makes a backup of the metastore
    influxd backup -database <mydatabase> <path-to-backup> backs up a specific database

If you’re backing up a remote node, use these instructions.

  1. Restore the backup to the new server
    To restore from a backup you will need to specify the type of backup, the path to where the backup should be restored, and the path to the backup.

    influxd restore [ -metadir | -datadir ] <path-to-meta-or-data-directory> <path-to-backup>

  2. Begin dual writes to both servers

  3. Migrate data for the interim period from when the initial backup was created to when the dual writes began
    The tricky part in this is that you have to backup a time slice. Importing requires you to sideload into a temp DB and then use a SELECT…INTO query to finalize the series.

    influxd backup -database mydb -start 2/15/2015T10:55:00Z -end 2/17/2015T10:00:00Z /path/to/backup

    and to restore it to a live instance:
    influxd restore -online -db mydb -newdb mydb_tmp /path/to/backup

  4. Turn off writes to the old server

Some of this is new in 1.5, so let me know if you have more questions.