Error="cannot allocate memory" leading to crash

dnovak · July 1, 2025, 9:48pm

Hi all,

So somewhat out of the blue today, the logs show this:

ts=2025-07-01T16:27:05.591361Z lvl=info msg=“Error adding new TSM files from snapshot. Removing temp files.” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot error=“cannot allocate memory”

And then these 3 lines over and over again:

…
ts=2025-07-01T16:27:46.678440Z lvl=info msg=“Error adding new TSM files from snapshot. Removing temp files.” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot error=“cannot allocate memory”

ts=2025-07-01T16:27:46.678459Z lvl=info msg=“Cache snapshot (end)” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=end op_elapsed=1.013ms

ts=2025-07-01T16:27:46.678464Z lvl=info msg=“Error writing snapshot” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 error=“cannot allocate memory”
ts=2025-07-01T16:27:46.841959Z lvl=info msg=“Cache snapshot (start)” log_id=0xMEzKc0000 service=storage-engine engine=tsm1 op_name=tsm1_cache_snapshot op_event=start
…

Until finally, it crashed with this:

fatal error: runtime: out of memory

After which was a stack dump in the logs.

When the system tried to come back up, it failed to restart and when I got it up and running again, it seemed like it has just been installed (had to create a “first” user, etc.). So I had to reload the last backup with the “–full” to get all the things back. Thankfully, it worked perfectly.

It’s running now and seems OK, but every couple of minutes, this comes up in the logs lasting for for several seconds (a couple hundred lines):

ts=2025-07-01T16:29:09.318413Z lvl=warn msg=“Write failed creating shard” log_id=0xThuSSl000 service=storage-engine service=write shard=5653 error=“opening shard previously failed with: [shard 5653] cannot allocate memory”

So this server has been running 2+ years (v2.7.1) without a hiccup, with most/all default settings. The whole server has never used more than ~12G of memory, out of 62G. So it’s not an actual memory issue. And the storage drive is around 2% utilized. All of the files in /mnt/data/influxdb/engine/data/17c4e52b4a1aa80c/autogen have these permissions: “drwxr-x—.”, so not a write permissions issue either.

So the questions are:
A: What happened and why?
B: Can we fix/prevent the “error=“opening shard previously failed with: [shard ] cannot allocate memory”” errors?

Thank you,
David

suyash · July 2, 2025, 8:55am

You probably want to review your memory settings for InfluxDB v2.x, see this similar question and answer for more information: Influx 2.6 using all memory of server total memory is 256 Gb

dnovak · July 2, 2025, 4:32pm

@suyash , I appreciate the pointer. I’ve updated my cache settings from their default, and doubled them to this:

“storage-cache-max-memory-size”: 2147483648,
“storage-cache-snapshot-memory-size”: 52428800,

36 minutes and so far, so good. No “cannot allocate memory” errors.

Thanks again,
David

Topic		Replies	Views
Crash loop with "cannot allocate memory" influxdb	3	2176	November 11, 2020
Lvl=error msg="Failed to open shard"; "cannot allocate memory" InfluxDB 1 influxdb	1	122	August 6, 2024
Cannot allocate memory when starting InfluxDB 2	5	576	April 21, 2023
Error replacing new TSM Files - tsm-use-seek=true? InfluxDB 1	2	905	January 25, 2023
Resource temporarily unavailable InfluxDB 1	1	957	November 26, 2021

Error="cannot allocate memory" leading to crash

Related topics