Failed to load tag keys after 2-3 hrs in running state


We recently upgraded to 2.1.1 then after a week, encountered a weird issue. InfluxDB just stopped responding to queries and experiencing “Failed to load tag keys” in data explorer, and in Grafana - query timeout.

Turned on debug and based from the logs it still is accepting write requests, no errors, just not responding to queries after 2-3 hrs in running state.

We are using GCP Kubernetes in a single node and our machine spec is:
Machine type: e2-standard-4
Boot disk type: Standard persistent disk
Boot disk size (per node): 100 GB

Tried increasing query concurrency to 100 and seemed to resolved the issue, but after 12hrs, experience OOM.
So lowered it down to “8”, yes lower than the default, because higher than that, the pod keeps failing because of very high CPU% and Memory%.

Appreciate if anyone can help me with this issue.