I’m running InfluxDB v1.6.3
In have a situation where the host machine was rebooted while (or immediately after) a temporary database was being dropped. The resulting state would not allow Influx to launch once the system had rebooted.
It attempts to start but hangs at this point and won’t accept any connections.
2019-03-13T17:56:59.009426Z info InfluxDB starting {"log_id": "0EA32b0G000", "version": "1.6.3", "branch": "1.6", "commit": "389de31c961831de0a9f4172173337d4a6193909"}
2019-03-13T17:56:59.009461Z info Go runtime {"log_id": "0EA32b0G000", "version": "go1.10.3", "maxprocs": 4}
2019-03-13T17:56:59.120332Z info Using data dir {"log_id": "0EA32b0G000", "service": "store", "path": "/root/influxdb/data"}
2019-03-13T17:56:59.120393Z info Compaction settings {"log_id": "0EA32b0G000", "service": "store", "max_concurrent_compactions": 2, "throughput_bytes_per_second": 50331648, "throughput_burst_bytes": 50331648}
2019-03-13T17:56:59.120417Z info Open store (start) {"log_id": "0EA32b0G000", "service": "store", "trace_id": "0EA32bS0000", "op_name": "tsdb_open", "op_event": "start"}
2019-03-13T17:56:59.135750Z info Opened file {"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/10/000000005-000000002.tsm", "id": 0, "duration": "2.219ms"}
2019-03-13T17:56:59.136875Z info Opened file {"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/105/000000012-000000002.tsm", "id": 0, "duration": "3.077ms"}
2019-03-13T17:56:59.140402Z info Opened file {"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/207/000000013-000000002.tsm", "id": 0, "duration": "5.200ms"}
2019-03-13T17:56:59.154700Z info Opened file {"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/157/000000013-000000002.tsm", "id": 0, "duration": "14.662ms"}
influx_inspect
reveals that all of the tsm files are healthy, but the series files related to the dropped database are corrupted.
$ influx_inspect verify-seriesfile -dir ~/influxdb/data
2019-03-13T18:48:59.331294Z error Error opening segment {"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "00", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332114Z error Error opening segment {"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "01", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332078Z error Error opening segment {"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "02", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332154Z error Error opening segment {"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "04", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332222Z error Error opening segment {"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "03", "segment": "0000", "error": "invalid series segment"}
Is there a recommended approach to recover from this state? And are there other failure conditions I can expect to cause it?