Influxdb crashes due to running out of memory

Hi All,

I’m experiencing unexpected crashes due to excessive memory usage in my installation (version 1.8.2). The host runs on 32G of memory and when I run a query, it will consume ~20G of memory and when I run the same query again and again, the memory usage grows and then it consumes all the free memory and the service crashes.

  • The DB size is 42 GB
  • Number of series is 185368

I’ve tried switching between tsi and inmem, but it got worse when I switched to tsi (it started to crash on second queries).

Example query (time range is 1 year):
SELECT count("state") FROM "testdb"."autogen"."telemetry" WHERE time > :dashboardTime: AND time < :upperDashboardTime: AND "state"='on' GROUP BY time(1d), "endpoint" FILL(null)

Logs:

influxd: fatal error: runtime: out of memory
influxd: runtime stack:
influxd: runtime.throw(0x166acb8, 0x16)
influxd: /usr/local/go/src/runtime/panic.go:774 +0x72
influxd: runtime.sysMap(0xc728000000, 0x4000000, 0x35aec78)
influxd: /usr/local/go/src/runtime/mem_linux.go:169 +0xc5
influxd: runtime.(*mheap).sysAlloc(0x3595c80, 0x10000, 0x10000, 0x98000a3aeb)
influxd: /usr/local/go/src/runtime/malloc.go:701 +0x1cd
influxd: runtime.(*mheap).grow(0x3595c80, 0x8, 0xffffffff)
...

You can check the full log output from https://pastebin.pl/view/97437ed3

I suspect you are querying all of the series? (~185k) Memory allocation for materializing query results is proportional to the number of series. The stacktrace shows you are running out of memory to expand the heap. You’ll need to do some profiling (go tool pprof) to examine the actual heap size and compare that to other memory pressures on your machine. Data files are mmap’ed but the OS should evict those to service the heap but it can’t evict all of them. Increasing swap memory may help. If you have other memory consumers on this machine/vm, consider removing them.

I actually believe that there is also another related issue, I’ve explained it in https://github.com/influxdata/influxdb/issues/19500.

  • I’m querying all the series
  • I’ve tried to vertically scale the instance to even 128G memory, didn’t help, the memory usage grows as I run the same query which goes through all the series (while using tsi)
  • There are no other memory consumers other than influxd

I’ll do some further analysis once I complete migrating to 2.0, thanks!