Influxdb-1.3.6 fatal error: out of memory

Hi, my inflxudb is getting down due to “fatal error: out of memory” once in a week.

Version - influxdb-1.3.6-1.x86_64
CentOS release 6.9
more than 100Gb RAM

fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory

runtime stack:
runtime.throw(0xb914c1, 0xd)
/usr/local/go/src/runtime/panic.go:596 +0x95
runtime.(*mcache).refill(0x7fac17a5de10, 0x13, 0x7f5976b7f6a0)
/usr/local/go/src/runtime/mcache.go:124 +0x120
runtime.(*mcache).nextFree.func1()
/usr/local/go/src/runtime/malloc.go:538 +0x32
runtime.systemstack(0xc42002c600)
/usr/local/go/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1132

fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory
fatal error: out of memory

runtime stack:
runtime.throw(0xb914c1, 0xd)
/usr/local/go/src/runtime/panic.go:596 +0x95
runtime.(*mcache).refill(0x7fac17a5de10, 0x13, 0x7f5976b7f6a0)
/usr/local/go/src/runtime/mcache.go:124 +0x120
runtime.(*mcache).nextFree.func1()
/usr/local/go/src/runtime/malloc.go:538 +0x32
runtime.systemstack(0xc42002c600)
/usr/local/go/src/runtime/asm_amd64.s:327 +0x79
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1132

goroutine 199962089 [running]:
runtime.systemstack_switch()
/usr/local/go/src/runtime/asm_amd64.s:281 fp=0x14265dac6d0 sp=0x14265dac6c8
runtime.(*mcache).nextFree(0x7fac17a5de10, 0x7fac17a5de13, 0x0, 0x14265dac770, 0x411808)
/usr/local/go/src/runtime/malloc.go:539 +0xb9 fp=0x14265dac728 sp=0x14265dac6d0
runtime.mallocgc(0x140, 0x0, 0x7fac17a5de00, 0x1441fec3e00)
/usr/local/go/src/runtime/malloc.go:691 +0x827 fp=0x14265dac7c8 sp=0x14265dac728
runtime.rawstring(0x139, 0x0, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/runtime/string.go:237 +0x85 fp=0x14265dac7f8 sp=0x14265dac7c8
runtime.rawstringtmp(0x0, 0x139, 0x139, 0x0, 0x1441fec3e00, 0x139, 0x139)
/usr/local/go/src/runtime/string.go:107 +0x78 fp=0x14265dac838 sp=0x14265dac7f8
runtime.concatstrings(0x0, 0x14265dac918, 0x3, 0x3, 0x1441fec3e00, 0xc5bb1b73c0)
/usr/local/go/src/runtime/string.go:46 +0xf9 fp=0x14265dac8d0 sp=0x14265dac838
runtime.concatstring3(0x0, 0xc5b5e7c120, 0x107, 0xb8965b, 0x4, 0xc5531af9e0, 0x2e, 0x1441fec3e00, 0x139)
/usr/local/go/src/runtime/string.go:59 +0x47 fp=0x14265dac910 sp=0x14265dac8d0
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).buildFloatCursor(0xc5bb62c120, 0xc9a4f02600, 0x11, 0xc5b5e7c120, 0x107, 0xc5531af9e0, 0x2e, 0xeb6360, 0x13a50fdcba0, 0xf19df0, …)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:2014 +0x128 fp=0x14265dacb80 sp=0x14265dac910
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).buildCursor(0xc5bb62c120, 0xc9a4f02600, 0x11, 0xc5b5e7c120, 0x107, 0x13a5286b140, 0xeb6360, 0x13a50fdcba0, 0xf19df0, 0x0, …)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1999 +0x4fe fp=0x14265dacd18 sp=0x14265dacb80
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createVarRefSeriesIterator(0xc5bb62c120, 0x13a5286b140, 0xc9a4f02600, 0x11, 0xc5b5e7c120, 0x107, 0xf275fbf2c0, 0x0, 0x0, 0x0, …)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1939 +0xa01 fp=0x14265dadba8 sp=0x14265dacd18
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetGroupIterators(0xc5bb62c120, 0x13a5286b140, 0xc9a4f02600, 0x11, 0x144109cc7e0, 0x32, 0x982, 0xf275fbf2c0, 0x14410a4e7e0, 0x32, …)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1811 +0x1be fp=0x14265dadde0 sp=0x14265dadba8
github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators.func1(0x1441d2246b0, 0x144171c5500, 0x38, 0x38, 0xc5bb62c120, 0x13a5286b140, 0xc9a4f02600, 0x11, 0xf275fbf2c0, 0x1441cd8fa00, …)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1771 +0x12c fp=0x14265dadf88 sp=0x14265dadde0
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0x14265dadf90 sp=0x14265dadf88
created by github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Engine).createTagSetIterators
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/engine.go:1772 +0x33a

goroutine 1 [chan receive, 15372 minutes]:
main.(*Main).Run(0xc42022ce40, 0xc4200ae060, 0x4, 0x4, 0x0, 0x0)
/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:98 +0x416
main.main()
/go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:46 +0xad

goroutine 17 [syscall, 15377 minutes, locked to thread]:
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:2197 +0x1

goroutine 20 [syscall, 15377 minutes]:
os/signal.signal_recv(0x0)
/usr/local/go/src/runtime/sigqueue.go:116 +0x104
os/signal.loop()
/usr/local/go/src/os/signal/signal_unix.go:22 +0x22
created by os/signal.init.1
/usr/local/go/src/os/signal/signal_unix.go:28 +0x41

goroutine 22 [IO wait, 15377 minutes]:
net.runtime_pollWait(0x7fac17a12f88, 0x72, 0xeb1260)
/usr/local/go/src/runtime/netpoll.go:164 +0x59
net.(*pollDesc).wait(0xc420367e28, 0x72, 0xeaa688, 0xc42000c0e0)
/usr/local/go/src/net/fd_poll_runtime.go:75 +0x38
net.(*pollDesc).waitRead(0xc420367e28, 0xffffffffffffffff, 0x0)
/usr/local/go/src/net/fd_poll_runtime.go:80 +0x34
net.(*netFD).accept(0xc420367dc0, 0x0, 0xeaf5a0, 0xc42000c0e0)
/usr/local/go/src/net/fd_unix.go:430 +0x1e5
net.(*TCPListener).accept(0xc4203da1a8, 0x0, 0xc4200666e8, 0x470074)
/usr/local/go/src/net/tcpsock_posix.go:136 +0x2e
net.(*TCPListener).Accept(0xc4203da1a8, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/net/tcpsock.go:228 +0x49
github.com/influxdata/influxdb/tcp.(*Mux).Serve(0xc4203e2600, 0xeb8160, 0xc4203da1a8, 0xc4203da1a8, 0x0)
/go/src/github.com/influxdata/influxdb/tcp/mux.go:75 +0x97
created by github.com/influxdata/influxdb/cmd/influxd/run.(*Server).Open
/go/src/github.com/influxdata/influxdb/cmd/influxd/run/server.go:361 +0x24a

Have you tried upgrading to a more recent version of InfluxDB? We had similar problems with the 1.3.x version that appear to have been addressed in 1.4.x.

Thank you @bayendor, I upgraded to latest version, the fix is by setting index-version=“tsi1” , I still have “inmem” ?

We did not convert to tsi1, it solved our problem while still using inmem. High series cardinality can use up a lot of memory, but you seem to have plenty of memory on your instance. Also, if unbounded queries are being made by users via either http or Grafana (or similar), it can also cause high memory usage. There are config settings to control that.

If you have a lot of ephemeral data, then trying tsi1 may solve your issue, but I would test it out first, since converting back to inmem can be problematic.

Thank you @bayendor which config setting to control unbounded query?

@bayendor , I upgraded influxdb to latest but still have same OOM issue.
I noticed, it happens if someone queries 1min interval data query for last 30 days.

Tried to change max-row-limit from 0 to 20000, even after that same OOM issue.

How can I rectify it?
Does max-row-limit help in this case? if yes how to know limit ?