Influxdb 1.6.4 fatal error: runtime: out of memory

influxdb

#1

Hi, I’m running Influxdb as a docker container on a Amazon Linux ami 4.9.20-11.31.amzn1.x86_64 having 4GB memory and 2 cpu t2.medium instance. I’m using Influxdb as remote read and write db for Prometheus. I’m a new to influxdb and using it for last 5 days. First 3 days there was no issue but from the 4th day i’m facing the below issue. Please help.

Error message:
fatal error: runtime: out of memory

runtime stack:
runtime.throw(0xdf4c4e, 0x16)
/usr/local/go/src/runtime/panic.go:616 +0x81
runtime.sysMap(0xc4fcee0000, 0x150000, 0x7f0070b13000, 0x137fe78)
/usr/local/go/src/runtime/mem_linux.go:216 +0x20a
runtime.(*mheap).sysAlloc(0x1366980, 0x150000, 0x7f0070b132f8)
/usr/local/go/src/runtime/malloc.go:470 +0xd4
runtime.(*mheap).grow(0x1366980, 0xa8, 0x0)
/usr/local/go/src/runtime/mheap.go:907 +0x60
runtime.(*mheap).allocSpanLocked(0x1366980, 0xa8, 0x137fe88, 0xc420187ee0)
/usr/local/go/src/runtime/mheap.go:820 +0x301
runtime.(*mheap).alloc_m(0x1366980, 0xa8, 0x410100, 0xc4191896ff)
/usr/local/go/src/runtime/mheap.go:686 +0x118
runtime.(*mheap).alloc.func1()
/usr/local/go/src/runtime/mheap.go:753 +0x4d
runtime.(*mheap).alloc(0x1366980, 0xa8, 0xc420010100, 0x41406c)
/usr/local/go/src/runtime/mheap.go:752 +0x8a
runtime.largeAlloc(0x150000, 0x450001, 0x7f00928fd000)
/usr/local/go/src/runtime/malloc.go:826 +0x94
runtime.mallocgc.func1()
/usr/local/go/src/runtime/malloc.go:721 +0x46
runtime.systemstack(0x0)
/usr/local/go/src/runtime/asm_amd64.s:409 +0x79
runtime.mstart()
/usr/local/go/src/runtime/proc.go:1175


#2

Hi Sumit. You may be running into an issue of high series cardinality. Can you compare your hardware to the sizing guidelines? https://docs.influxdata.com/influxdb/v1.7/guides/hardware_sizing/


#3

Hi Sonia,

Thank you for looking into the issue. As per your advise, I will look into the h/w sizing guidelines.

Just to let you know, last few days I was testing the db after removing the remote read and write option from prometheus. Basically no new data was being written to the db and no read queries happening. Still influxdb was going down every 1-2 mins with the error “running out memory”. After I added 6GB of swap space, it became stable and running for last 15hrs without going down.
Today again I enabled the remote read/write option in prometheus and withing few minutes the db crashed due to " fatal error: runtime: out of memory".

Sonia, is there any query/doc through which I can find out the following details:

  1. Number of fields written per second
  2. Number of queries per second
    3 Number of unique [series].

#4

Hi, my prototyping and testing with InfluxDB was stopped for a few months, (but I will start again in days for a real system), but I experienced some problem with memory management too, and I didn’t succeed in obtaining deterministic informations about these.
(Check my post: Memory usage forever growing with INF RP? )
I can tell you to check both: your DB structure and Retention Policies.

In my understanding, memory management is always slowly increasing due to indexes partially in RAM (to make querying faster), but the “big steps” in memory needs are due to:

  • new Series (keep in mind that not only a new Measurement is a new Serie, but also a new value used for the first time for a Tag, so if you have Tags with several values…they should be turned into Fields.
  • creation of new shards of data due to Retention Policies (something will be replicated, so the same “samples” stored with a retention policy of 2 days will consume more memory than if stored with a retention policy of 2 weeks. So…use a long retention period, or a short one, but move old data elsewhere as soon as possible …and delete them from the machine that is collecting new data .

#5

I just ran the below query on the db:

> show series cardinality;
cardinality estimation
----------------------
921161
########################
> show measurement cardinality;
cardinality estimation
----------------------
505
------------------------------------------------

#6

Hi Sumit, are you using Chronograf? If so, you should be able to see an InfluxDB dashboard that monitors queries and writes. You can see it if you navigate to Host List then click influxdb in the telegraf host listed


#7

Hi Sonia,

I’m not using Chronograf. Is there any other way to find out the details without installing any additional application.


#8

Hey Sumit, no problem. Try running this curl command and look for pointsWrittenOK to see the number of points written if you’re running locally (the endpoint corresponds to http://influxdb:8086/debug/vars):
curl http://localhost:8086/debug/vars

There is a lot of other information in that JSON blob that may prove useful to you.


#9

Ran curl on the db and got the below result.
“pointsWrittenOK”:1094897

Sonia, Is the result of “pointsWrittenOK” per minute or per second value ?


#10

It is the number of points written since the last restart of the DB. One of our devs wrote about it here: https://gist.github.com/stuartcarnie/1d47b64262734b24b7eb60380c48fdb9

You may be able to extrapolate from that how to arrive at the points per second value. Unfortunately I don’t know of any other way to get that value.

I am chatting with one of our devs to see if I can get a query for you to run to get this value.


#11

Another ask would be if you are using TSI or TSM. The default is TSM and can increase the memory requirements for InfluxDB to run. If you are running out of memory, consider migrating to TSI.

Here are some document



#12

Thanks Esity,
As you suggested I looked into the config and found out it is using TSM1.


#13

Thanks Sonia,

Will check the link.