TL;DR: I’m currently trying to bulk insert billions of data points into InfluxDB using the
-import flag. After less than 0.5 billion data points have been ingested, the InfluxDB process gets killed because the system ran out of memory (InfluxDB is using all the RAM). Any ways to prevent that?
I have a data set of about 500 billion points and I’m considering using InfluxDB to store them.
I’m trying to insert about 1% of data as a POC to see how it’ll behave. Unfortunately, I’m only able to get to 0.1% before the process gets killed because of an OOM exception.
The current system I’m using for the import has 16GB of RAM, which should be plenty. It also uses SSD.
By the time the process gets killed, I have about 200 series in the database (so it’s not an issue with series cardinality).
The data is stored in a single measurement, with one tag and one field.
In the first few minutes, InfluxDB is inserting data at around 300,000 points a second. After about 10 minutes (when it started using most of the RAM), that drops down to about 200,000. 20 minutes later, we’re under 100,000 and the process gets killed.
What can I do to prevent this from happening? I have a few ideas but not sure which one I should go with:
- Wait n seconds between each import to let InfluxDB catch up
- Wait n seconds every m imports
- Force and wait for the WAL to be processed (if that’s even possible) every m imports
- Tweak the default settings somehow