Recover from an "invalid series segment"

Greg_Schrock · March 18, 2019, 7:29pm

I’m running InfluxDB v1.6.3

In have a situation where the host machine was rebooted while (or immediately after) a temporary database was being dropped. The resulting state would not allow Influx to launch once the system had rebooted.

It attempts to start but hangs at this point and won’t accept any connections.

2019-03-13T17:56:59.009426Z	info	InfluxDB starting	{"log_id": "0EA32b0G000", "version": "1.6.3", "branch": "1.6", "commit": "389de31c961831de0a9f4172173337d4a6193909"}
2019-03-13T17:56:59.009461Z	info	Go runtime	{"log_id": "0EA32b0G000", "version": "go1.10.3", "maxprocs": 4}
2019-03-13T17:56:59.120332Z	info	Using data dir	{"log_id": "0EA32b0G000", "service": "store", "path": "/root/influxdb/data"}
2019-03-13T17:56:59.120393Z	info	Compaction settings	{"log_id": "0EA32b0G000", "service": "store", "max_concurrent_compactions": 2, "throughput_bytes_per_second": 50331648, "throughput_burst_bytes": 50331648}
2019-03-13T17:56:59.120417Z	info	Open store (start)	{"log_id": "0EA32b0G000", "service": "store", "trace_id": "0EA32bS0000", "op_name": "tsdb_open", "op_event": "start"}
2019-03-13T17:56:59.135750Z	info	Opened file	{"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/10/000000005-000000002.tsm", "id": 0, "duration": "2.219ms"}
2019-03-13T17:56:59.136875Z	info	Opened file	{"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/105/000000012-000000002.tsm", "id": 0, "duration": "3.077ms"}
2019-03-13T17:56:59.140402Z	info	Opened file	{"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/207/000000013-000000002.tsm", "id": 0, "duration": "5.200ms"}
2019-03-13T17:56:59.154700Z	info	Opened file	{"log_id": "0EA32b0G000", "engine": "tsm1", "service": "filestore", "path": "/root/influxdb/data/_internal/monitor/157/000000013-000000002.tsm", "id": 0, "duration": "14.662ms"}

influx_inspect reveals that all of the tsm files are healthy, but the series files related to the dropped database are corrupted.

$ influx_inspect verify-seriesfile -dir ~/influxdb/data
2019-03-13T18:48:59.331294Z	error	Error opening segment	{"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "00", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332114Z	error	Error opening segment	{"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "01", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332078Z	error	Error opening segment	{"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "02", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332154Z	error	Error opening segment	{"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "04", "segment": "0000", "error": "invalid series segment"}
2019-03-13T18:48:59.332222Z	error	Error opening segment	{"log_id": "0EA612gG000", "path": "/root/influxdb/data/t128_tmp/_series", "partition": "03", "segment": "0000", "error": "invalid series segment"}

Is there a recommended approach to recover from this state? And are there other failure conditions I can expect to cause it?

MarcV · March 18, 2019, 10:13pm

Hi ,
You can try to move the files from the dropped database to a different directory .

Greg_Schrock · March 18, 2019, 10:50pm

@MarcV,

Thanks for your response! Which files do you mean? The files under the _series directory, everything under the data/t128_tmp directory, the wal/t128_tmp directory, or all of the above?

I guess that’s straight forward enough here since the data is unwanted, but even since I posted I’ve had the same failure for an un-dropped databases where I do need to salvage the data. I would like a procedure that allows most, if not all, of the data to be maintained.

MarcV · March 18, 2019, 11:12pm

I mean all directories t128_tmp,

I guess the only way is taking backups daily and restore in case of failure ?
That means you can loose a day of data ,
Is that acceptable ?

Greg_Schrock · March 19, 2019, 1:07pm

I guess that’s an option. So you’re saying there is no way to recover from this state? There is no means for rebuilding the segments from the TSM files?

As a test, I did try deleting the segment directories rm -rf influxdb/data/t128_tmp/_series/*. After that, Influx could launch and I haven’t found any adverse affects. The series directories reappeared - I assume they are recreated during launch. I can even access data from that temporary database.

Is that a viable approach, or am I missing something?

MarcV · March 19, 2019, 1:25pm

Hi Greg ,
I was focused on getting your database open again
It is strange that you can access data from that temporary database , or is it configured somewhere in telegraf ? It means the database is not dropped or was recreated … ?

I don’t know if they can be recreated , influx_inspect has the command buildtsi ,
( generates tsi1 indexes from tsm1 data ) and series are indexed so probably you can recreate the ‘series’ files with this command.

update : I tested it ,

rm _series/*
influx_inspect buildtsi -database dbcpu -datadir “/home/influxdb/data” -waldir “/home/influxdb/wal”

and the series are back …

Greg_Schrock · March 19, 2019, 1:50pm

Excellent! Thanks, that’s what I was looking for.

I’m working in a somewhat volatile environment and it’s likely this will happen again. I’d like to have a procedure in place and automated ideally.

Topic		Replies	Views
Error: invalid series segment version InfluxDB 2 influxdb	1	161	March 3, 2024
InfluxDB 2 stop working , influxd: Error: invalid series segment InfluxDB 2 influxdb	1	849	August 11, 2021
InfluxDB not starting after power outage InfluxDB 1	3	2233	August 30, 2021
Can't start InfluxDB! InfluxDB 1	1	607	September 21, 2021
My influxdb can't start influxdb	8	19002	August 19, 2020

Recover from an "invalid series segment"

Related topics