InfluxDB v2.7.11
Red Hat Enterprise Linux release 9.5
Influx becomes non responsive every 3 to 6 days
Sequence
Grafana has execution error connecting to influx , unable to read influxdb
Logon to influx server and check influx logs
Often no logs around the time the error starts. We have tuned the logs to error only at present in case we were being ratelimited on the logs.
But even previously only seeing normal logs
We use a script and run a local write to DB via API - this still works so Influx is definitely up
On restart after influx startup ( approx 5 to 6 minutes ), the RMQ eventually de-spools and we are back to full metrics
We are looking for pointers as to how to help identify the key issue here - wondering if its an ingestion limit
Config.json
“bolt-path”: “/data/influxdb/bolt/influxd.bolt”,
“engine-path”: “/data/influxdb/engine”,
“flux-log-enabled”: false,
“hardening-enabled”: true,
“http-bind-address”: “:8086”,
“http-idle-timeout”: “15m0s”,
“http-read-header-timeout”: “5s”,
“http-read-timeout”: “15s”,
“http-write-timeout”: “15s”,
“influxql-max-select-buckets”: 0,
“influxql-max-select-point”: 0,
“influxql-max-select-series”: 1000000,
“instance-id”: “:8086”,
“log-level”: “error”,
“metrics-disabled”: false,
“no-tasks”: false,
“pprof-disabled”: false,
“reporting-disabled”: true,
“secret-store”: “bolt”,
“session-length”: 60,
“session-renew-disabled”: false,
“secret-store”: “vault”,
“sqlite-path”: “/data/influxdb/sqlite/influxd.sqlite”,
“session-length”: 120,
“store”: “disk”,
“testing-always-allow-setup”: false,
“tls-cert”: “/data/influxdb/secure/peer.crt”,
“tls-key”: “/data/influxdb/secure/peer.key”,
“tls-min-version”: 1.2,
“tls-strict-ciphers”: false,
“tracing-type”: “log”,
“ui-disabled”: false