I’ve for years been battling the dreaded “out of memory” crashes in Influx, and following the relevant reports and advice. Despite some initial attention, it seems many of the reports are now running out of solutions. An abridged list appears below.
My particular issue is very low load (cardinality <1000, data insertion rate < 1 point per minute, queries < 1 per minute) and ever increasing RAM usage. This is across a dozen instances on different servers with different databases and different use cases. 512MB used to be enough, now 1GB is not enough. We experimented with v1.5 and v1.6 and TMI vs TSI but quickly rolled back to v1.4.2 because it was just not workable.
So are there any prospects for managing RAM on constrained hardware with low load use cases? We tend to spin up VPS instances to run demos for clients or other experiments, so have many instances each with very simple requirements. But the VPS has to be sized entirely to accommodate these odd RAM explosions.
Is there any work on the horizon to tune or set parameters to limit RAM usage, even at the expense of performance?
Heath, thanks for raising this. I’m in the same boat. I’ve been using influxdb for about 2 years and this has been a continual problem. I have some instances using ~100GB of memory and I’d appreciate any insight into how to reduce this or even how to figure out where it’s all going.
We also struggle with this. So far our SOP is to watch memory usage creep up over a month or two, and then manually restart the InfluxDB process during our maintenance window, which resets it down to a reasonable consumption.
Hi Jayesh. I don’t think it’s going to help much. If I take just one instance the answers are 909MB used total, 811MB by influx (as reported by ps) with all databases using autogen. Ask me tomorrow and the numbers will be different again, but the theme is the same - almost all the memory on any of my Influx servers is used by Influx until it runs out and crashes.
Here’s what I see on my server. And atleast in my case, 32 GB of memory is used for filesystem caching - which I believe is mostly for memory-mapped files.
Many years ago (2011-2013), I had a similar situation with MongoDB wherein as I added more data, MongoDB’s memory footprint would grow and the only way out was to restart MongoDB as lukecyca described.
Wondering if it is is the same situation for you too.
dtord03dvo19d.dc.dotomi.net:/home/jthakrar>ps aux | egrep 'CPU|influxd'
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
influxdb 3429 20.2 37.9 103602708 18717376 ? Ssl Aug13 4407:06 /usr/bin/influxd -config /etc/influxdb/influxdb.conf
dtord03dvo19d.dc.dotomi.net:/home/jthakrar>free -g
total used free shared buff/cache available
Mem: 46 12 1 0 32 33
Swap: 23 2 21
dtord03dvo19d.dc.dotomi.net:/home/jthakrar>influx -version
InfluxDB shell version: 1.4.2
dtord03dvo19d.dc.dotomi.net:/home/jthakrar>cat /etc/redhat-release
CentOS Linux release 7.3.1611 (Core)
I was able to simulate your situation (and it seems obvious once you think about it).
So I have a virtual machine with 8 cores and 46 GB RAM.
I am pushing about 816+ million data points across many measurements (OpenTSDB format/mode) into the system with 7 day retention.
When I queries the system (from Chronograf) for 7 days of data across 4 of the largest measurements, the influxDB memory footprint when from 10GB to 40GB (of 46 GB on the server).
Not only that, it “messed up” things so much, that my dashboard stopped updating for a while, the load remained high even after I changed the “past” time to 6 hours.
So essentially this says two things:
InfluxDB will try hard to response to the queries (and hence can scale with your hardware) and if you push it beyond its limit, there can be unpredictable problems
I guess that’s my fears confirmed. We can’t keep chasing Influx’s RAM usage up and up unbounded. But there doesn’t appear to be any interest in getting a hold of it. It’s going to be painful but I think we need to look elsewhere before we get any further ingrained with TICK.
Basically this fix lowers the floor of the heap, which leaves more breathing room for whatever it using 10s of GB during runtime (heap use varies between around 50-100GB on my instances).
We really need some better tools for understanding where it’s going, so I can have some clue of how to re-structure things to fix this…
Hi everybody,
I’m running InfluxDB 1.7.6-1 on Windows 2008 Server, and I’m experiencing memory usage issues.
I configured index-version = “tsi1”, rebuild TSI for existing data and reboot influxd to limit memory consumption on writing operations.
But again I experience the unlimited memory usage growing when a client, for example influx console or grafana, make a query for reading a great amount of time series.
The actual cardinality is (estimated) 723, (exact) 15353.
same problem here. After switching to 1.7.x I had to extend the memory from 12GB to 32GB and its not enough. I also switched to TSI1 but it doesn’t solve the problem. Now every few hours, I get an OOM and systemd starts the influxd again. Next problem: I use also Icingaweb2 with the Grafana module and the graphs takes too long. If I click in the graph, it takes round about 30sec or more to show the graph.
If execute “show shard groups;” on my InfluxDB … and grep the IDs, I have a lot of them, which are not in /var/lib/influxdb/
I have no idea, if that is good or bad. I assume the second.