Is Influx really suitable for large datasets? RAM is crashing after less than 10 minutes on an empty DB

Antoine_Lucas · July 4, 2019, 1:06pm

I have been using Influx with some stock options minutes data - and it was working fine on a 8c/16GB AWS instance, consistently using 80% of available RAM.

I have only one measurement, and one tag (Stock symbol) - which has around 500 values - and around 9 float64 values per measurements.

So it was working fine until I decided to import more points - roughly 1 Million minutes since 2010.

I use the GoLang API , and following the recommendations with a 1000 minutes per batch - doing around 1000 batch requests per stock. It inserts fast, but hell ! The RAM goes up the roof - and crashes the server after few minutes of running.

Basically I started the import at the time I started writing this post (and a raw install of influx), and now i look at it, and 12GB of RAM are used by Influx.

Once it crashes, Influx restarts and instantly fills up the RAM - then after few dozen minutes, the RAM usage starts decreasing.

What are my options? Do I have to rent a HUGE AWS machine to populate the dataset first - and migrate to a smaller specs machine after that?

is there anything wrong?

If nothing is wrong, maybe we should update the doc explaining what’s going on when inserting huge amount of measurements …

Antoine_Lucas · July 5, 2019, 5:38am

To reply to my own question -

I couldn’t find any other way than increasing the machine’s specs - at least during the initial import.

Influx uses a lot of CPU & RAM during insertion - but once the insertion rate drops down significantly, the engine does some ‘magic’ (probably compressing/indexing) and the RAM usage drops down. After having inserted 500 millions records, the RAM usage went up to 65GB. Then, slowly decreased to 1GB usage.

I haven’t tested the instance with production read usage yet.

rvdheij · July 10, 2019, 1:04pm

I learned (and read) that the number of series is the main consumer of memory. I had some tags that turned out to have random data in some instances and I ran out of 4G eventually. I now exclude those tags for the series where they don’t apply.
The number of series is in the _internal data, so you can see that over time. I think the web site has some rules of thumb on requirements per series. Clearly it will stop consuming more when you stop adding more series.

Topic		Replies	Views
Influx running OOM when loading large amount data InfluxDB 2 influxdb , time-series , query	1	906	May 9, 2022
[question] Influxdb RAM usage Store influxdb	4	3679	May 1, 2017
InfluxDB v2 High RAM usage and leading into OOM and constant restart InfluxDB 2 influxdb	5	4149	July 27, 2024
8 billion values, influxd uses 50 Gb ram - is this normal? Store	1	1172	August 23, 2017
High memory usage problem influxdb	8	14448	August 26, 2018

Is Influx really suitable for large datasets? RAM is crashing after less than 10 minutes on an empty DB

Related topics