OOM on InfluxDB or InfluxDB stopped working

Our influxdb constantly crashes with pod’s memory set currently at 32GB and limit as 64GB. Not crashing after requests set to 32GB memory but stopped working until restarted. The memory hit 10GB after a week and it stopped working. I removed all customization in the data section of the configMap so default can take effect. We have tried two many parameters that didn’t resolve the issue. We have had more then 10 incidents in the last 30-40 days and we need permanent solution now. Version number is 1.8.0
Databases: 40+ databases with 364 retention
Shards: Excessive shards counts due to 30-day shards durations x 364 day retention
Memory Usage: Default cache Settings
Replication: replicationN = 1 for all databases (single pod)
Writes per hour: 845 requests/hour and 40-45k per day.

We have another environment on v1.8.3 with similar configuration but running standalone on a service with much high writes i.e. 500k/hour that has been working fine with no issues.

Any insight to determine root cause or isolation issues will be appreciated.
I have used Google and AI to troubleshoot and gather details with no success.

Adding @davidby-influx here for this input. If you haven’t seen the recent announcement, suggest you try to upgrade to the latest 1.x release - Release Announcement: InfluxDB OSS 1.12.3 and InfluxDB Enterprise 1.12.3 as it has quite a few performance improvements that might help fix this issue.

I reommend upgrading to a much more recent version. 1.12.2 or later. Memory footprint is improved in 1.12.3, and we expect 1.12.4 out early next week, which would be the best bet.

You can use Go profiling o see what;s consuming your memory, but I wouldn’t bother until you upgrade.