InfluxDB v3 Core - Memory Management for Large Imports

Hi,

I am migrating from InfluxDB v2 to v3.7.0 Core on RHEL 9.6 using file-based storage.

I imported 25 line protocol files which is around 96K records (each file not exceeds 10k records), but after the import i noticed that 12GB RAM used (80%) (Server: 15GB RAM total).

Still we have 6 to 7 million records to import. Memory grows and doesn’t release. No parquet files created yet, waited for more than 6 hours (only i can see WAL files created).

Setup:

InfluxDB: v3.7.0 Core
Storage: File-based (–object-store file)
Config: --force-snapshot-mem-threshold 70%, --wal-snapshot-size 100

What is the recommended batch size (records per write) for imports to optimize memory?
Should there be a time interval/delay between file imports?
Will parquet files eventually be created to free memory?

Any guidance on import strategy for large migrations would be appreciated. Thanks.

1 Like

My funny feeling after having observed InfluxDB3 behaviour now for a few months is the following:

  • we ingest data as fast as we can and store it in memory to not skip any data
  • memory grows while data still comes in
  • we hope that the data stream will easy at some stage so we can process and write to file

This to me means:
You indeed give the system more time to not just ingest but also digest and write the data.
If that does not happen, the system will rather crash than skipping data ingestion, a strategy that seems weirdly flawed.

So yes, you need to give the system time to digest during import batches.
How much? I don’t know. Maybe it needs some sort of monitoring of RAM/CPU and then continue to feed data.

Hi @binosheen97 should also explicitly tune memory settings—especially --exec-mem-pool-bytes and --force-snapshot-mem-threshold so Parquet persistence can run (every ~10 minutes by default) and move older data out of memory into files. See more here: Troubleshoot issues writing data to InfluxDB | InfluxDB 3 Core Documentation and InfluxDB 3 Core performance tuning and optimization | InfluxDB 3 Core Documentation

Hi, this is my influxdb3 settings.

/opt/influxdb3/influxdb3 serve \
–object-store file \

–data-dir /data/influxdb3 \

–node-id node0 \

–http-bind 0.0.0.0:8181 \

–exec-mem-pool-bytes 60% \

–force-snapshot-mem-threshold 70% \

–wal-flush-interval 10s \

–wal-snapshot-size 100 \

–parquet-mem-cache-size 10% \

–datafusion-num-threads 6

Best way is to tweaking those parameter values (specially memory ones) and see if you notice improvements?