Regular issues compacting TSI files, very high load, "cannot allocate memory"

Hello,

I am using InfluxDB 1.8.9 (32bit binary) on a Raspberry Pi 4, 4GB RAM, 64bit kernel (aarch64).
I was bitten by the out of memory errors of the Raspi 32bit kernel and thus enabled the 64bit kernel. But now it happens again, regularly, and the system load skyrockets, POST requests time out and lose data:

raspberrypi influxd-systemd-start.sh[10393]: ts=2021-10-24T16:44:15.882924Z lvl=info msg="Error replacing new TSM files" log_id=0XOc6RHG000 engine=tsm1 tsm1_strategy=full tsm1_optimize=false trace_id=0XOnoRxW000 op_name=tsm1_compact_group db_shard_id=56 error= cannot allocate memory"

This happens in an endless loop, always for the same shard_ids, and even offline compaction did not help at first (influxd buildtsi -compact-series-file).

At this time, influxd is consuming ~600MB of real and ~3700MB of virtual memory and there is some to spare (I had buffers of 2GB), so it doesn’t seem to be a physical memory issue, but maybe one of address space (32bit = 4GB)?

I have some questions. :slight_smile:

  1. Would installing the arm64 (instead of armhf architecture help? Would this work at all on my Raspi 4 preferably without reinstalling everything?
  2. Alternatively, would upgrading to Influx 2.0 do any better, is the problem solved there and the memory requirements when compacting do not depend on database size any more?
  3. What other alternatives do I have?

Thank you!

Replying to my own question.

Yes, this is an InfluxDB ARM 32bit bug. Influx insists on mapping the whole database into memory, which fails if the database size is larger than the addressable memory size (2GB on 32bit kernels, ~3.6GB on 64bit kernel with 32bit userland).
There is a fix available which doesn’t map the whole database, but uses seek instead:

However, Influxdata did not incorporate this into the official release since (as far as I understand) 32bit is not relevant for them any more. (Hello, everybody with older Raspberry Pi hardware!) See other comments in this thread.

So the obvious and quick fix seems to be to use this patch.
I did this, and my memory usage dropped from 3.7G virtual to ~1G virtual memory usage. Resident memory started at ~400M and went up to ~800M as before.

In the long run, if Influxdata wants to phase out 32bit architectures, I see the following alternatives for 32bit machines:

  1. questdb.io - another time series database. Claims to be much faster, can read line protocol and has the same HTTP API, so should be compatible and an easy drop in replacement. It can also speak SQL. (No Flux though.)
  2. Timescale.com. Time series database built on PostgresSQL with full scale SQL support.

I will now test the patch provided above and see how it performs when compacting data.
If this works, I’ll stay with Influx 1.x for now.

Hello @Jens thanks for sharing!

You’re welcome!

The patch has been running in my RPi setup since then without a single hiccup, and without any more OOM situations.

Any plans to incorporate the mmap patch also for 64bit architectures and Influx >= 2.0?
It seems to reduce memory footprint significantly, and I don’t just mean mapped region, but resident memory. This should also work for 64bit.