Significant memory usage issues with Influxd.exe on a Windows environment

I’m experiencing significant memory usage issues with Influxd.exe on a Windows environment and could use some guidance. Here are the details:

Problem
The Influxd.exe process is consuming over 120GB of memory on our Windows server, leading to performance concerns. This problem has occurred multiple times, and we’ve implemented various steps to mitigate it.

Steps Taken So Far

  1. Memory Limitation Testing: We attempted to limit memory usage, but on Windows, InfluxDB doesn’t seem to support this as it does on Linux.

  2. Detailed Logging: We enabled detailed logging to track the memory spike events, although it requires an InfluxDB restart to fully take effect.

  3. Threshold Monitoring: SMS alerts are set to notify us when memory usage reaches 40% and 50%.Configuration Details

We’ve configured the InfluxDB instance according to our customer’s provided configuration. Despite this, we’re unable to prevent the high memory usage, likely due to the limitations in Windows environment memory control.

Questions

  1. Are there any known optimizations or workarounds within the InfluxDB configuration that might help reduce memory usage in a Windows environment?

  2. Would adjusting cache or WAL file settings be effective in controlling memory usage?

  3. Is there any recommended approach for managing memory usage on Windows without switching to a Linux environment?

Hello @KevinHuh,
Welcome! Unfortunately InfluxDB v2 will consume as much memory as it can. Operator control is a feature that is provided with v3 (especially clustered). With v2 your options are limited.

The WAL (Write-Ahead Log) can consume significant memory if not configured properly.

You could make smaller WAL segments force InfluxDB to flush data more frequently, reducing memory usage:

[data]
wal-segment-size = "5MB"

You could also flush more frequently. For example you could lower the cache-snapshot-memory-size and cache-snapshot-write-cold-duration to flush the cache to disk more often:

[data]
cache-snapshot-memory-size = "64MB"
cache-snapshot-write-cold-duration = "5m"

What is your series cardinality like?
If its too high this could be contributing to it.

Also you could look at shard management as it can impact memory usage significantly. You might want to try reducing the shard duration if your workload involves frequent queries on recent data and that’s causing the spike.