Startup of Influx is really slow

kevink00 · February 28, 2020, 8:04pm

My InfluxDB lives in AWS. I did a snapshot restore to EBS volume and now the start of the InfluxDB is taking around 20 minutes. What’s wrong here and how can I correct it?

This is what my logs look like while the db starts:

Feb 28 20:00:05 ip-10-110-113-252 influxd: ts=2020-02-28T20:00:05.782589Z lvl=info msg=“Opened file” log_id=0LFLugF0000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/telegraf/autogen/230/000032814-000000007.tsm id=0 duration=209.248ms
Feb 28 20:00:06 ip-10-110-113-252 influxd: ts=2020-02-28T20:00:06.084226Z lvl=info msg=“Opened file” log_id=0LFLugF0000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/telegraf/autogen/230/000032814-000000008.tsm id=1 duration=301.567ms
Feb 28 20:00:07 ip-10-110-113-252 influxd: ts=2020-02-28T20:00:07.158444Z lvl=info msg=“Opened file” log_id=0LFLugF0000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/telegraf/autogen/212/000026914-000000007.tsm id=2 duration=2860.066ms
Feb 28 20:00:07 ip-10-110-113-252 influxd: ts=2020-02-28T20:00:07.901511Z lvl=info msg=“Opened file” log_id=0LFLugF0000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/telegraf/autogen/230/000032814-000000009.tsm id=2 duration=1817.226ms
Feb 28 20:00:09 ip-10-110-113-252 influxd: ts=2020-02-28T20:00:09.388456Z lvl=info msg=“Opened file” log_id=0LFLugF0000 engine=tsm1 service=filestore path=/var/lib/influxdb/data/telegraf/autogen/230/000032814-000000010.tsm id=3 duration=2229.940ms

Igor · March 5, 2020, 10:30pm

Snapshot restore is lazy. AWS says restore has completed, wile it continue to restore files in the background and on-demand. As InfluxDB starts up, it reads all shards and needs them to be actually restored before it can start. You can either wait for restore to actually complete by monitoring IO utilization to drop to zero, or just wait for InfluxDB to start.

I also wanted to notice, that 20 minutes is not bad. Our instance starts almost an hour (not after restore), so we have to run a separate instance for short retention policy with just two shards, and have middle layer join results with historical instance of InfluxDB, if it is up and running. I wish InfluxDB was as lazy as AWS and started HTTP service only after scanning the latest shard.

Topic		Replies	Views
About the influxdb2 boot msg InfluxDB 2	1	144	October 20, 2023
Very slow v2.0.0-beta.16 InfluxDB 2	1	477	October 16, 2020
InfluxDB is super slow to start on Raspberry Pi InfluxDB 2 getting-started	1	2673	February 26, 2020
Very slow queries and high CPU usage InfluxDB 1 influxdb , query	0	2411	June 28, 2021
Debug extremely slow querie	0	1179	February 21, 2018

Startup of Influx is really slow

Related Topics