Addressing the growing RAM vs usage issue, aka, unexpected "out of memory"

I’ve for years been battling the dreaded “out of memory” crashes in Influx, and following the relevant reports and advice. Despite some initial attention, it seems many of the reports are now running out of solutions. An abridged list appears below.

My particular issue is very low load (cardinality <1000, data insertion rate < 1 point per minute, queries < 1 per minute) and ever increasing RAM usage. This is across a dozen instances on different servers with different databases and different use cases. 512MB used to be enough, now 1GB is not enough. We experimented with v1.5 and v1.6 and TMI vs TSI but quickly rolled back to v1.4.2 because it was just not workable.

So are there any prospects for managing RAM on constrained hardware with low load use cases? We tend to spin up VPS instances to run demos for clients or other experiments, so have many instances each with very simple requirements. But the VPS has to be sized entirely to accommodate these odd RAM explosions.

Is there any work on the horizon to tune or set parameters to limit RAM usage, even at the expense of performance?

https://community.influxdata.com/t/i-would-like-to-bring-down-the-memory-usage-on-influxdb-1-5-2-1-x86-64-out-of-memory-kill-process/5530
https://community.influxdata.com/t/influxdb-1-3-6-fatal-error-out-of-memory/3941/7
https://community.influxdata.com/t/memory-increase-slowly-over-17-hours-until-oom-killed-it/3893
https://community.influxdata.com/t/memory-usage-on-low-end-hardware/3412
https://community.influxdata.com/t/influx-memory-usage/3039
https://community.influxdata.com/t/high-memory-usage-problem/1604/8
https://community.influxdata.com/t/influxdb-memory-usage/6223
https://community.influxdata.com/t/tsm-based-vs-tsi-based-memory-usage/6170
https://community.influxdata.com/t/influxdb-1-2-4-high-memory-consumption-synchronize-with-number-of-writes/4682
https://community.influxdata.com/t/out-of-memory-when-select-from-limit-1-on-128gb-host-influxdb-1-2-4/3678
https://community.influxdata.com/t/influxdb-out-of-memory-periodically-v1-3-1/1879
https://community.influxdata.com/t/tsm-compaction-and-influxd-memory-usage/974

1 Like

Heath, thanks for raising this. I’m in the same boat. I’ve been using influxdb for about 2 years and this has been a continual problem. I have some instances using ~100GB of memory and I’d appreciate any insight into how to reduce this or even how to figure out where it’s all going.

We also struggle with this. So far our SOP is to watch memory usage creep up over a month or two, and then manually restart the InfluxDB process during our maintenance window, which resets it down to a reasonable consumption.

Hi Heath, Sloach, Lukecyca,

Can you share

  • the memory usage (e.g. “free -g”) on the server(s) and by the influxd process?
  • the retention policy info across your databases
    I am curious as I have just started using TICK since 2 weeks ago.

Hi Jayesh. I don’t think it’s going to help much. If I take just one instance the answers are 909MB used total, 811MB by influx (as reported by ps) with all databases using autogen. Ask me tomorrow and the numbers will be different again, but the theme is the same - almost all the memory on any of my Influx servers is used by Influx until it runs out and crashes.

Here’s what I see on my server. And atleast in my case, 32 GB of memory is used for filesystem caching - which I believe is mostly for memory-mapped files.

Many years ago (2011-2013), I had a similar situation with MongoDB wherein as I added more data, MongoDB’s memory footprint would grow and the only way out was to restart MongoDB as lukecyca described.

Wondering if it is is the same situation for you too.

dtord03dvo19d.dc.dotomi.net:/home/jthakrar>ps aux | egrep 'CPU|influxd'
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
influxdb  3429 20.2 37.9 103602708 18717376 ?  Ssl  Aug13 4407:06 /usr/bin/influxd -config /etc/influxdb/influxdb.conf

dtord03dvo19d.dc.dotomi.net:/home/jthakrar>free -g
              total        used        free      shared  buff/cache   available
Mem:             46          12           1           0          32          33
Swap:            23           2          21

dtord03dvo19d.dc.dotomi.net:/home/jthakrar>influx -version

InfluxDB shell version: 1.4.2

dtord03dvo19d.dc.dotomi.net:/home/jthakrar>cat /etc/redhat-release 

CentOS Linux release 7.3.1611 (Core)

I was able to simulate your situation (and it seems obvious once you think about it).
So I have a virtual machine with 8 cores and 46 GB RAM.
I am pushing about 816+ million data points across many measurements (OpenTSDB format/mode) into the system with 7 day retention.
When I queries the system (from Chronograf) for 7 days of data across 4 of the largest measurements, the influxDB memory footprint when from 10GB to 40GB (of 46 GB on the server).

Not only that, it “messed up” things so much, that my dashboard stopped updating for a while, the load remained high even after I changed the “past” time to 6 hours.

So essentially this says two things:
InfluxDB will try hard to response to the queries (and hence can scale with your hardware) and if you push it beyond its limit, there can be unpredictable problems :slight_smile:

1 Like

I guess that’s my fears confirmed. We can’t keep chasing Influx’s RAM usage up and up unbounded. But there doesn’t appear to be any interest in getting a hold of it. It’s going to be painful but I think we need to look elsewhere before we get any further ingrained with TICK.

I found that the 1.6.3 release helped a bit with memory usage for my instances, probably due to this: Remove TSI1 HLL sketches from heap. · influxdata/influxdb@88d006a · GitHub

Basically this fix lowers the floor of the heap, which leaves more breathing room for whatever it using 10s of GB during runtime (heap use varies between around 50-100GB on my instances).

We really need some better tools for understanding where it’s going, so I can have some clue of how to re-structure things to fix this…

1 Like

Hi everybody,
I’m running InfluxDB 1.7.6-1 on Windows 2008 Server, and I’m experiencing memory usage issues.

I configured index-version = “tsi1”, rebuild TSI for existing data and reboot influxd to limit memory consumption on writing operations.

But again I experience the unlimited memory usage growing when a client, for example influx console or grafana, make a query for reading a great amount of time series.
The actual cardinality is (estimated) 723, (exact) 15353.

Any news about this?
Thanks.

hi,

same problem here. After switching to 1.7.x I had to extend the memory from 12GB to 32GB and its not enough. I also switched to TSI1 but it doesn’t solve the problem. Now every few hours, I get an OOM and systemd starts the influxd again. Next problem: I use also Icingaweb2 with the Grafana module and the graphs takes too long. If I click in the graph, it takes round about 30sec or more to show the graph.

If execute “show shard groups;” on my InfluxDB … and grep the IDs, I have a lot of them, which are not in /var/lib/influxdb/
I have no idea, if that is good or bad. I assume the second.

Take a look at VictoriaMetrics. It usually needs much lower amounts of RAM compared to InfluxDB when dealing with high cardinality series. See Insert benchmarks with inch: InfluxDB vs VictoriaMetrics | by Aliaksandr Valialkin | Medium