Influxdb crashing out of memory

I’m having a problem with influxdb 1 running out of memory. I’m running inside a docker container on a raspberry 32bit raspian os on a pi4 8gb with 16gb of swap (I saw another post that said to increase swap, so I did) it didn’t seem to help. I rolled back to a previous days backup and adjusted all my retention policies decreasing the size and updating the continuous queries accordingly. I want to upgrade to the 64bit raspian os, but haven’t had the time and wanted to pick up another pi4 to make sure the transition went smoothly but that’s near impossible right now. Any help would be greatly appreciated.

However I’m still getting the out of memory error:
runtime: out of memory: cannot allocate 24576-byte block (1399291904 in use)
fatal error: out of memory

goroutine 4503 [running]:
runtime.throw(0xfebddd, 0xd)
/usr/local/go/src/runtime/panic.go:774 +0x5c fp=0x4549324 sp=0x4549310 pc=0x41644
runtime.(*mcache).refill(0xb6ef9008, 0x59)
/usr/local/go/src/runtime/mcache.go:140 +0xfc fp=0x4549338 sp=0x4549324 pc=0x262ec
runtime.(*mcache).nextFree(0xb6ef9008, 0x59, 0x540be400, 0x2, 0x9)
/usr/local/go/src/runtime/malloc.go:854 +0x7c fp=0x4549358 sp=0x4549338 pc=0x1b0f4
runtime.mallocgc(0x1300, 0xe1e790, 0x1, 0xc9bea8)
/usr/local/go/src/runtime/malloc.go:1022 +0x7a0 fp=0x45493c0 sp=0x4549358 pc=0x1ba40
runtime.makeslice(0xe1e790, 0x23c, 0x23c, 0x1)
/usr/local/go/src/runtime/slice.go:49 +0x6c fp=0x45493d4 sp=0x45493c0 pc=0x5928c
runtime.makeslice64(0xe1e790, 0x23c, 0x0, 0x23c, 0x0, 0x2)
/usr/local/go/src/runtime/slice.go:63 +0x44 fp=0x45493e8 sp=0x45493d4 pc=0x593d0
github.com/influxdata/influxdb/tsdb/engine/tsm1.timeBatchDecodeAllRLE(0x78ecbf9f, 0xc, 0x543cfc, 0x0, 0x0, 0x0, 0x543cfd, 0xc, 0x0, 0x1, …)
/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/batch_timestamp.go:280 +0x284 fp=0x4549444 sp=0x45493e8 pc=0xc9c0a4
github.com/influxdata/influxdb/tsdb/engine/tsm1.TimeArrayDecodeAll(0x78ecbf9f, 0xc, 0x543cfc, 0x0, 0x0, 0x0, 0x78ecbfab, 0xc, 0x543cf0, 0x0, …)

I had this problem quite some time ago…
before the panic you should find some trace of whatever activity was in progress, in my case was shard compaction.

In my experience (on Windows) there is no real workaround, if there is not enough memory the engine will crash, the only way is to allocate more resources or try to limit data/operations, but the solution depends on what actually caused the crash.

It would be helpful to know:

  • InfluxDB version
  • InfluxDB configuration (if the default is in use pls export it anyway as it may change based on version)
  • Which operation was running before the crash

I’m not sure how to tell what operation was running at the time of the crash.

InfluxDB shell version: 1.8.9

Merging with configuration at: /etc/influxdb/influxdb.conf
reporting-disabled = false
bind-address = "127.0.0.1:8088"

[meta]
  dir = "/var/lib/influxdb/meta"
  retention-autocreate = true
  logging-enabled = true

[data]
  dir = "/var/lib/influxdb/data"
  index-version = "inmem"
  wal-dir = "/var/lib/influxdb/wal"
  wal-fsync-delay = "0s"
  validate-keys = false
  strict-error-handling = false
  query-log-enabled = false
  cache-max-memory-size = 1073741824
  cache-snapshot-memory-size = 26214400
  cache-snapshot-write-cold-duration = "10m0s"
  compact-full-write-cold-duration = "4h0m0s"
  compact-throughput = 50331648
  compact-throughput-burst = 50331648
  max-series-per-database = 1000000
  max-values-per-tag = 100000
  max-concurrent-compactions = 0
  max-index-log-file-size = 1048576
  series-id-set-cache-size = 100
  series-file-max-concurrent-snapshot-compactions = 0
  trace-logging-enabled = false
  tsm-use-madv-willneed = false

[coordinator]
  write-timeout = "10s"
  max-concurrent-queries = 0
  query-timeout = "0s"
  log-queries-after = "0s"
  max-select-point = 0
  max-select-series = 0
  max-select-buckets = 0

[retention]
  enabled = true
  check-interval = "30m0s"

[shard-precreation]
  enabled = true
  check-interval = "10m0s"
  advance-period = "30m0s"

[monitor]
  store-enabled = true
  store-database = "_internal"
  store-interval = "10s"

[subscriber]
  enabled = true
  http-timeout = "30s"
  insecure-skip-verify = false
  ca-certs = ""
  write-concurrency = 40
  write-buffer-size = 1000

[http]
  enabled = true
  bind-address = ":8086"
  auth-enabled = false
  log-enabled = true
  suppress-write-log = false
  write-tracing = false
  flux-enabled = false
  flux-log-enabled = false
  pprof-enabled = true
  pprof-auth-enabled = false
  debug-pprof-enabled = false
  ping-auth-enabled = false
  prom-read-auth-enabled = false
  https-enabled = false
  https-certificate = "/etc/ssl/influxdb.pem"
  https-private-key = ""
  max-row-limit = 0
  max-connection-limit = 0
  shared-secret = ""
  realm = "InfluxDB"
  unix-socket-enabled = false
  unix-socket-permissions = "0777"
  bind-socket = "/var/run/influxdb.sock"
  max-body-size = 25000000
  access-log-path = ""
  max-concurrent-write-limit = 0
  max-enqueued-write-limit = 0
  enqueued-write-timeout = 30000000000

[logging]
  format = "auto"
  level = "info"
  suppress-logo = false

[[graphite]]
  enabled = false
  bind-address = ":2003"
  database = "graphite"
  retention-policy = ""
  protocol = "tcp"
  batch-size = 5000
  batch-pending = 10
  batch-timeout = "1s"
  consistency-level = "one"
  separator = "."
  udp-read-buffer = 0

[[collectd]]
  enabled = false
  bind-address = ":25826"
  database = "collectd"
  retention-policy = ""
  batch-size = 5000
  batch-pending = 10
  batch-timeout = "10s"
  read-buffer = 0
  typesdb = "/usr/share/collectd/types.db"
  security-level = "none"
  auth-file = "/etc/collectd/auth_file"
  parse-multivalue-plugin = "split"

[[opentsdb]]
  enabled = false
  bind-address = ":4242"
  database = "opentsdb"
  retention-policy = ""
  consistency-level = "one"
  tls-enabled = false
  certificate = "/etc/ssl/influxdb.pem"
  batch-size = 1000
  batch-pending = 5
  batch-timeout = "1s"
  log-point-errors = true

[[udp]]
  enabled = false
  bind-address = ":8089"
  database = "udp"
  retention-policy = ""
  batch-size = 5000
  batch-pending = 10
  read-buffer = 0
  batch-timeout = "1s"
  precision = ""

[continuous_queries]
  log-enabled = false
  enabled = true
  query-stats-enabled = false
  run-interval = "1s"

[tls]
  min-version = ""
  max-version = ""

You should find something in the log before the crash, it may or not be relevant, but if you don’t check you can’t know.

I’d change the following setting:
max-concurrent-compactions = 1

Maximum number of full and level compactions that can run concurrently. A value of 0 results in 50% of runtime.GOMAXPROCS(0) used at runtime. Any number greater than zero limits compactions to that value.
If you have two cores it won’t make a difference as it’s already one (50% of total cores)…

Another option you could evaluate is to change the index-version to tsi1, which should be more efficient, it requires a migration of the shards. details about the TSI index can be found here

It looks like its crashing during the compaction.

The influxd is still crashing on my raspberry pi 4 8gig (armv71) even after moving the index-version to tsi1 and changing the max-concurrent-compactions and also running through the conversion process.

I have moved my influxdb to a linux x86_64 VM running in a windows hypervisor (I think I’m saying that right) So far it seems to be holding strong.

Could this be a limitation of a 32bit OS that I’m running on the raspberry pi?