From 9f6b208080da97a44165603bb6cfb034389aa690 Mon Sep 17 00:00:00 2001 From: Rishab87 Date: Sun, 3 Aug 2025 16:13:49 +0530 Subject: [PATCH 1/3] added docs for high availability --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index 7cb09617a..476024e6f 100644 --- a/README.md +++ b/README.md @@ -53,6 +53,7 @@ are deleted they are no longer visible on the `/metrics` endpoint. * [Horizontal sharding](#horizontal-sharding) * [Automated sharding](#automated-sharding) * [Daemonset sharding for pod metrics](#daemonset-sharding-for-pod-metrics) +* [High Availability](#high-availability) * [Setup](#setup) * [Building the Docker container](#building-the-docker-container) * [Usage](#usage) @@ -304,6 +305,12 @@ spec: Other metrics can be sharded via [Horizontal sharding](#horizontal-sharding). +### High Availability + +For high availability, run multiple kube-state-metrics replicas with anti-affinity rules to prevent single points of failure. Configure 2 replicas, anti-affinity rules on hostname, rolling update strategy with `maxUnavailable: 1`, and a PodDisruptionBudget with `minAvailable: 1`. + +When using multiple replicas, Prometheus will scrape all instances resulting in duplicate metrics with different instance labels. Handle deduplication in queries using `avg without(instance) (metric_name)`. Brief inconsistencies may occur during state transitions but resolve quickly as replicas sync with the API server. + ### Setup Install this project to your `$GOPATH` using `go get`: From 1d0189286e280216568720743f3d57fd54964b3e Mon Sep 17 00:00:00 2001 From: Rishab87 Date: Mon, 4 Aug 2025 20:05:45 +0530 Subject: [PATCH 2/3] fixing CI error --- README.md.tpl | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md.tpl b/README.md.tpl index 4f6941380..143377680 100644 --- a/README.md.tpl +++ b/README.md.tpl @@ -53,6 +53,7 @@ are deleted they are no longer visible on the `/metrics` endpoint. * [Horizontal sharding](#horizontal-sharding) * [Automated sharding](#automated-sharding) * [Daemonset sharding for pod metrics](#daemonset-sharding-for-pod-metrics) +* [High Availability](#high-availability) * [Setup](#setup) * [Building the Docker container](#building-the-docker-container) * [Usage](#usage) @@ -305,6 +306,12 @@ spec: Other metrics can be sharded via [Horizontal sharding](#horizontal-sharding). +### High Availability + +For high availability, run multiple kube-state-metrics replicas with anti-affinity rules to prevent single points of failure. Configure 2 replicas, anti-affinity rules on hostname, rolling update strategy with `maxUnavailable: 1`, and a PodDisruptionBudget with `minAvailable: 1`. + +When using multiple replicas, Prometheus will scrape all instances resulting in duplicate metrics with different instance labels. Handle deduplication in queries using `avg without(instance) (metric_name)`. Brief inconsistencies may occur during state transitions but resolve quickly as replicas sync with the API server. + ### Setup Install this project to your `$GOPATH` using `go get`: From 4d263a1754b85f7b0abc20d408c23482ee5a41a4 Mon Sep 17 00:00:00 2001 From: Rishab87 Date: Fri, 8 Aug 2025 23:30:35 +0530 Subject: [PATCH 3/3] addressing reviews --- README.md | 4 ++-- README.md.tpl | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 476024e6f..347f3bc91 100644 --- a/README.md +++ b/README.md @@ -307,9 +307,9 @@ Other metrics can be sharded via [Horizontal sharding](#horizontal-sharding). ### High Availability -For high availability, run multiple kube-state-metrics replicas with anti-affinity rules to prevent single points of failure. Configure 2 replicas, anti-affinity rules on hostname, rolling update strategy with `maxUnavailable: 1`, and a PodDisruptionBudget with `minAvailable: 1`. +For high availability, run multiple kube-state-metrics replicas to prevent a single point of failure. A standard setup uses at least 2 replicas, pod anti-affinity rules to ensure they run on different nodes, and a PodDisruptionBudget (PDB) with `minAvailable: 1` to protect against voluntary disruptions. -When using multiple replicas, Prometheus will scrape all instances resulting in duplicate metrics with different instance labels. Handle deduplication in queries using `avg without(instance) (metric_name)`. Brief inconsistencies may occur during state transitions but resolve quickly as replicas sync with the API server. +When scraping the individual pods directly in an HA setup, Prometheus will ingest duplicate metrics distinguished only by the instance label. This requires you to deduplicate the data in your queries, for example, by using `max without(instance) (your_metric)`. The correct aggregation function (max, sum, avg, etc.) is important and depends on the metric type, as using the wrong one can produce incorrect values for timestamps or during brief state transitions. ### Setup diff --git a/README.md.tpl b/README.md.tpl index 143377680..933dccf60 100644 --- a/README.md.tpl +++ b/README.md.tpl @@ -308,9 +308,9 @@ Other metrics can be sharded via [Horizontal sharding](#horizontal-sharding). ### High Availability -For high availability, run multiple kube-state-metrics replicas with anti-affinity rules to prevent single points of failure. Configure 2 replicas, anti-affinity rules on hostname, rolling update strategy with `maxUnavailable: 1`, and a PodDisruptionBudget with `minAvailable: 1`. +For high availability, run multiple kube-state-metrics replicas to prevent a single point of failure. A standard setup uses at least 2 replicas, pod anti-affinity rules to ensure they run on different nodes, and a PodDisruptionBudget (PDB) with `minAvailable: 1` to protect against voluntary disruptions. -When using multiple replicas, Prometheus will scrape all instances resulting in duplicate metrics with different instance labels. Handle deduplication in queries using `avg without(instance) (metric_name)`. Brief inconsistencies may occur during state transitions but resolve quickly as replicas sync with the API server. +When scraping the individual pods directly in an HA setup, Prometheus will ingest duplicate metrics distinguished only by the instance label. This requires you to deduplicate the data in your queries, for example, by using `max without(instance) (your_metric)`. The correct aggregation function (max, sum, avg, etc.) is important and depends on the metric type, as using the wrong one can produce incorrect values for timestamps or during brief state transitions. ### Setup