I have an Influx database that has recently entered a crash-loop, producing the following log messages each time. It gives some lengthy strack-traces or something after what I paste here.
The database is admittedly somewhat large (we recently made a change that pushed it from 21 million series to 23 million series, ironically by dropping some tags on new data). Assuming the ultimate cause of the crash is “your database is too large”, do you have any advice on how to recover what we’ve got? We could give up the problematic tags on old data, but as far as I know, the only way to do that is to export, modify, and re-import, which I think would be pretty cumbersome. This is with InfluxDB 1.8.3.
Nov 04 09:37:39 xenial-template influxd[12350]: ts=2020-11-04T17:37:39.800669Z lvl=info msg="Error adding new TSM files from snapshot. Removing temp files." log_id=0QGxy7u0000 engine=tsm1 trace_id=0QH5D8V0000 op_name=tsm1_cache_snapshot error="cannot allocate memory"
Nov 04 09:37:39 xenial-template influxd[12350]: ts=2020-11-04T17:37:39.802607Z lvl=info msg="Cache snapshot (end)" log_id=0QGxy7u0000 engine=tsm1 trace_id=0QH5D8V0000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=1373.912ms
Nov 04 09:37:39 xenial-template influxd[12350]: ts=2020-11-04T17:37:39.802628Z lvl=info msg="Error writing snapshot" log_id=0QGxy7u0000 engine=tsm1 error="cannot allocate memory"
Nov 04 09:37:39 xenial-template influxd[12350]: ts=2020-11-04T17:37:39.802641Z lvl=info msg="Cache snapshot (start)" log_id=0QGxy7u0000 engine=tsm1 trace_id=0QH5DDrW000 op_name=tsm1_cache_snapshot op_event=start
Nov 04 09:37:40 xenial-template influxd[12350]: fatal error: runtime: cannot allocate memory
Nov 04 09:37:40 xenial-template influxd[12350]: runtime stack: