Limit kube_pod_info queries to specified cluster to prevent timeouts #1135
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description:
This PR improves dashboard performance and prevents query timeouts by scoping kube_pod_info time series queries to the selected cluster (via clusterLabel), instead of scanning all 150 Kubernetes clusters.
Problem
Currently, dashboard queries using kube_pod_info fetch pod metadata across all clusters in our single large Mimir (Prometheus) instance, regardless of the configured clusterLabel. This leads to:
Timeouts on short time ranges (e.g., 5 minutes): ~136,684 time series fetched. Look at below screenshot.
Backend failures on longer ranges (e.g., 2 hours): exceeds maximum series limit and it causes prometheus backend to go slow often.
Solution
Align kube_pod_info filtering with other network metrics by restricting queries to the cluster specified in the clusterLabel dashboard variable.
Impact