We’re using Prometheus in Openshift to log infrastructure and application metrics, and sending that data out to our existing, external InfluxDB cluster. All good, and we can create some awesome dashboards, but we’re struggling with cardinality. I’ve set Prometheus relabelling up to drop some of our tags with particularly high uniqueness, and I’m looking a taking a few measurements out altogether - but we’re still struggling.we turned on kube-state-metrics about 2 weeks ago, and since then, cardinality has been steady climbing.
We’ve recently altered the retention policy from 30 days to 7 days - intended as a temporary measure to get InfluxDB working again - but that doesn’t seem to have changed anything, and the cardinality is still increasing.
Can anyone give me any hints on how to manage this? It’s gone up from 9 million to 12.3 million in the last seven days. We’ve increased the memory to 64GB on each of the three hosts in the cluster, but performance of InfluxDB is being seriously compromised. What can we do?