Using Influx 1.7.10, tsi1
In kubernetes via GitHub - influxdata/helm-charts: Official Helm Chart Repository for InfluxData Applications
and during normal daily operations (writing data and continuous queries), where CPU is at 50% of 1 core, and using ~4GB of 8GB total memory
the health check (which is hitting /ping) is often timing out.
Is this expected? There’s nothing in the logs revealing any problems, and cpu memory look fine. What could be the problem?
Gentle bump. We’ve upgraded to 4 cores, and 16GB ram, and 1TB of AWS’ EBS disk. (no throttling when you provision >= 1TB), but we’re still getting the timeout on pings (when we’re at high load) and thus kubernetes shutting down Influx.
It looks like the issue is too many concurrent reads, writes, and compactions, which causes memory to spike, which causes /ping to timeout, and k8 to kill the instance.
Setting INFLUXDB_DATA_COMPACT_THROUGHPUT and INFLUXDB_DATA_COMPACT_THROUGHPUT_BURST to 5 works well for AWS EBS disk
and setting INFLUXDB_HTTP_MAX_CONCURRENT_WRITE_LIMIT to 3 has also helped our backfill jobs.
But there’s seemingly no way to queue READS, because setting INFLUXDB_COORDINATOR_MAX_CONCURRENT_QUERIES will return an error to the client… this is fine, but I was hoping we could queue inside Influx.