I deployed an influxdb 1.8 statefulset in our k8s cluster (EKS) using the helm tool. The influxdb is backed by an datastore which is based on the Elastic File System provided by AWS.
The entire setup was working fine until a team member decided to stress test it. Now the influxdb pod refuses to come alive with the Liveness and Readiness probe reporting failures. I increased the timeout and delay in both the probes but the pod still refuses to come up.
Redeploying results in the same with the pod trying to come up and failing - causing an endless CrashLoopBackOff restart of the pod. My last resort is to delete the data and wal directory on the EFS before redeploying it. But I’d like to know if I can bring it up with the data intact.
The pod memory request is set to 8 GB and and the limit is set to 16 GB.