Error="cannot allocate memory" leading to crash

Hi all,

So somewhat out of the blue today, the logs show this:

ts=2025-07-01T16:27:05.591361Z lvl=info msg=“Error adding new TSM files from snapshot. Removing temp files.” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot error=“cannot allocate memory”

And then these 3 lines over and over again:


ts=2025-07-01T16:27:46.678440Z lvl=info msg=“Error adding new TSM files from snapshot. Removing temp files.” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot error=“cannot allocate memory”

ts=2025-07-01T16:27:46.678459Z lvl=info msg=“Cache snapshot (end)” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=1.013ms

ts=2025-07-01T16:27:46.678464Z lvl=info msg=“Error writing snapshot” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 error=“cannot allocate memory”
ts=2025-07-01T16:27:46.841959Z lvl=info msg=“Cache snapshot (start)” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start

Until finally, it crashed with this:

fatal error: runtime: out of memory

After which was a stack dump in the logs.

When the system tried to come back up, it failed to restart and when I got it up and running again, it seemed like it has just been installed (had to create a “first” user, etc.). So I had to reload the last backup with the “–full” to get all the things back. Thankfully, it worked perfectly.

It’s running now and seems OK, but every couple of minutes, this comes up in the logs lasting for for several seconds (a couple hundred lines):

ts=2025-07-01T16:29:09.318413Z lvl=warn msg=“Write failed creating shard” log_id=0xThuSSl000 service=storage-engine service=write shard=5653 error=“opening shard previously failed with: [shard 5653] cannot allocate memory”

So this server has been running 2+ years (v2.7.1) without a hiccup, with most/all default settings. The whole server has never used more than ~12G of memory, out of 62G. So it’s not an actual memory issue. And the storage drive is around 2% utilized. All of the files in /mnt/data/influxdb/engine/data/17c4e52b4a1aa80c/autogen have these permissions: “drwxr-x—.”, so not a write permissions issue either.

So the questions are:
A: What happened and why?
B: Can we fix/prevent the “error=“opening shard previously failed with: [shard ] cannot allocate memory”” errors?

Thank you,
David

You probably want to review your memory settings for InfluxDB v2.x, see this similar question and answer for more information: Influx 2.6 using all memory of server total memory is 256 Gb

@suyash , I appreciate the pointer. I’ve updated my cache settings from their default, and doubled them to this:

“storage-cache-max-memory-size”: 2147483648,
“storage-cache-snapshot-memory-size”: 52428800,

36 minutes and so far, so good. No “cannot allocate memory” errors.

Thanks again,
David

1 Like