Is Influx really suitable for large datasets? RAM is crashing after less than 10 minutes on an empty DB

I have been using Influx with some stock options minutes data - and it was working fine on a 8c/16GB AWS instance, consistently using 80% of available RAM.

I have only one measurement, and one tag (Stock symbol) - which has around 500 values - and around 9 float64 values per measurements.

So it was working fine until I decided to import more points - roughly 1 Million minutes since 2010.

I use the GoLang API , and following the recommendations with a 1000 minutes per batch - doing around 1000 batch requests per stock. It inserts fast, but hell ! The RAM goes up the roof - and crashes the server after few minutes of running.

Basically I started the import at the time I started writing this post (and a raw install of influx), and now i look at it, and 12GB of RAM are used by Influx.

Once it crashes, Influx restarts and instantly fills up the RAM - then after few dozen minutes, the RAM usage starts decreasing.

What are my options? Do I have to rent a HUGE AWS machine to populate the dataset first - and migrate to a smaller specs machine after that?

is there anything wrong?

If nothing is wrong, maybe we should update the doc explaining what’s going on when inserting huge amount of measurements …

To reply to my own question -

I couldn’t find any other way than increasing the machine’s specs - at least during the initial import.

Influx uses a lot of CPU & RAM during insertion - but once the insertion rate drops down significantly, the engine does some ‘magic’ (probably compressing/indexing) and the RAM usage drops down. After having inserted 500 millions records, the RAM usage went up to 65GB. Then, slowly decreased to 1GB usage.

I haven’t tested the instance with production read usage yet.

I learned (and read) that the number of series is the main consumer of memory. I had some tags that turned out to have random data in some instances and I ran out of 4G eventually. I now exclude those tags for the series where they don’t apply.
The number of series is in the _internal data, so you can see that over time. I think the web site has some rules of thumb on requirements per series. Clearly it will stop consuming more when you stop adding more series.