I am migrating from InfluxDB v2 to v3.7.0 Core on RHEL 9.6 using file-based storage.
I imported 25 line protocol files which is around 96K records (each file not exceeds 10k records), but after the import i noticed that 12GB RAM used (80%) (Server: 15GB RAM total).
Still we have 6 to 7 million records to import. Memory grows and doesn’t release. No parquet files created yet, waited for more than 6 hours (only i can see WAL files created).
What is the recommended batch size (records per write) for imports to optimize memory?
Should there be a time interval/delay between file imports?
Will parquet files eventually be created to free memory?
Any guidance on import strategy for large migrations would be appreciated. Thanks.
My funny feeling after having observed InfluxDB3 behaviour now for a few months is the following:
we ingest data as fast as we can and store it in memory to not skip any data
memory grows while data still comes in
we hope that the data stream will easy at some stage so we can process and write to file
This to me means:
You indeed give the system more time to not just ingest but also digest and write the data.
If that does not happen, the system will rather crash than skipping data ingestion, a strategy that seems weirdly flawed.
So yes, you need to give the system time to digest during import batches.
How much? I don’t know. Maybe it needs some sort of monitoring of RAM/CPU and then continue to feed data.