Extremely high memory usage


we are using influxDB for long term data storage, around 3 years of data. Total size of db is around 2.9 GB with low cardinality. Less than 2000. However, the memory usage is around 10 GB.

What should I check and try to optimize? I am kind of running out of ideas.
The version we are running is 2.7.1 and it is in AKS, so Kubernetes, chart by Bitnami.

Any help is welcome, since we really need to figure out why we have such a high memory usage on a relatively small DB and low cardinality.

For cardinality I used on each org:

import "array"
import "influxdata/influxdb"

bucketList =
        |> findColumn(fn: (key) => true, column: "id")

cardinalities =
        arr: bucketList,
        fn: (x) => {
            cardinality =
                (influxdb.cardinality(bucketID: x, start: time(v: 0))
                    |> findColumn(fn: (key) => true, column: "_value"))[0]

            return {bucketID: x, _value: cardinality}

array.from(rows: cardinalities)
    |> sum()

Reducing data by approximately 20% by removing some measurements and older data. Decreasing the cardinality by also about 20%, being around 1000. Nothing has changed, memory still high 10GB range.

I am guessing some caching cleaning will eventually lower it, but I have not figured out if I can trigger some job manually to do it right away?

Any other things I can try?

Also, tried it locally on 2.7.1 and 2.7.6 just to make sure it is not environment or something at fault but it is all the same. The CPU is usage, on the other hand, seems to be low.

Hello @maholi,
I’m not sure why your mem consumption is so high given your ingest. I’ve seen high mem consumption but with cardinality orders of magnitude higher.
Can you please submit an issue here?

Are you seeing any errors in the logs?

Testing some data splitting and loading in smaller chunks. So loading, approximately half of database, which is around 1.5 GB, with cardinality of around 530, we go up to 7.5 GB of memory. The other part, which is also around 1.5 GB with cardinality of around 480, it takes only 2.1 GB of memory. No any kind of errors in logs.

I first suspected that the issue might be maybe the way we split the data to reduce cardinality into organisations and buckets. So, I thought just having higher amount of organisations and/or buckets might be reserving some memory but creating a lot of them that are empty causes almost no memory increase. Several hundred of each did not cause any issues.
Also, the high memory example I mentioned has only 1 organisation and around 40 buckets, while lower, 2.1GB has around 10 organisations and around 30 buckets.

The only thing left I have not managed to test is the shards, so having a long period of data. The default shard time is 7 days, and data is few years. We have a lot of shards. I am not sure how that influences it and if it is a combination of long period with higher number of buckets.

It seems like the issue is related to shards.
You simply can’t have data for too long. Not even adding extra memory will help. At one point the DB will just start crashing and corrupt the files not being able to recover it anymore.

Such a big issue, completely unreliable to be used for serious production. We will be migrating away. None of these issues are mentioned anywhere in the documentation but seems like it is a problem since influx1. Also, very hard to find information about it.

Similar issues mentioned

DB being 750MB, having cardinality 50, less than 2mil data points. Changing shard durations but retention forever. Influx is using 7 GB memory.

Seems like there is no solution for this, you simply can’t have long term data.

I met similar memory problem but our influxdb comsumed large memory when startup.

Our influxdb data is large for about more than 1T and startup would consume about 200G memory.My environment is in a docker container.
Influxdb metrics show low goland melloc memory but linux top shows large RES.

The startup progress consume all of server’s memory and would get a OOM by linux kenel. If I gave it enough memory(swap) and let influxdb startup finished( 8086 port up), then influxd would slowly give memory back to system. Docker stats shows the lower memory stats which is different from what shows in top command.

Based our test, I doubt that it may be a memory allocation problem within influxdb( or golang ?) and containd( or docker, runc ).May be when influxdb startup, it loads all data index but with a wrong memory allocation?

Your environment is in K8s, so may be you could try to set a memory limit for the container.

Unfortunately, we can’t just infinitely scale memory. Also, limiting on K8s just crashes everything and gives OOM issues. So, in that scenario it never starts up.

We tried to split the data and one part of data, for unknown reasons is hitting some sweet spot which just makes the influx crash. It is not a lot of data but it is a bit over 4 years time span.

That crash, once it happens, I guess corrupts something on FS and can’t recover anymore. We have to restore from backup.

We found alternative solution and moved to that where we have more than 5x less memory usage on 2x data, and there are no spikes on startup.