Writing data non-chronological - impact on performance / storage?

coussej · January 23, 2018, 8:18am

Is there a performance/storage impact from writing data non-chronological?

I have an influxdb that has been collecting data for some time now. I want to migrate it to a new server. Is there a significant difference in the following two scenarios:

Redirect incoming data streams to new server, then simultaneously start migrating data from old server to new server. This means that the data is coming in non-chronological
Leave incoming streams on old server, start migrating data to new server and when the new one is caught up, redirect the incoming data. This means everything is written chronologically.

I have a preference for the first scenario, but I don’t know what this means for how this data would end up on disk (chronologically in shards our spread out over multiple shards mixed with new data) and how that would impact performance.

Thanks!

epa095 · February 16, 2018, 6:05pm

It is supposed to not matter after the database has done full compaction, which depending on your config, happens some hours after the last write. Writing out-of-order will make the compaction process itself take much longer (in my experience). I have experienced some quite weird performance issues with influxdb lately though, so I would not bet my first-born on it actually being irrelevant even though it is claimed to be.

katy · February 26, 2018, 7:28pm

Hi @coussej!

InfluxDB can handle temporary, non-chronological writes during migrations. When migrating to new servers, we recommend taking these steps:

Backup the old server
influxd backup <path-to-backup> makes a backup of the metastore
influxd backup -database <mydatabase> <path-to-backup> backs up a specific database

If you’re backing up a remote node, use these instructions.

Restore the backup to the new server
To restore from a backup you will need to specify the type of backup, the path to where the backup should be restored, and the path to the backup.

influxd restore [ -metadir | -datadir ] <path-to-meta-or-data-directory> <path-to-backup>
Begin dual writes to both servers
Migrate data for the interim period from when the initial backup was created to when the dual writes began
The tricky part in this is that you have to backup a time slice. Importing requires you to sideload into a temp DB and then use a SELECT…INTO query to finalize the series.

influxd backup -database mydb -start 2/15/2015T10:55:00Z -end 2/17/2015T10:00:00Z /path/to/backup

and to restore it to a live instance:
influxd restore -online -db mydb -newdb mydb_tmp /path/to/backup
Turn off writes to the old server

Some of this is new in 1.5, so let me know if you have more questions.

Topic		Replies	Views
"Out of Order" data entry? influxdb	3	2883	March 23, 2017
Does data order matter when bulk inserting? InfluxDB 2	5	1095	November 15, 2021
Importance of write order InfluxDB 2 influxdb	0	537	October 15, 2021
Losing data whenwriting at high throughput InfluxDB 2 influxdb	3	929	November 19, 2020
New to InfluxDB, questions about writing and retention	2	1859	April 15, 2018

Writing data non-chronological - impact on performance / storage?

Related topics