Rolling restart Admin API (#1026)

kbatuigas · JakeSCahill · commit 2e25c771fd13 · 2025-04-06T08:33:11.000+01:00
diff --git a/modules/get-started/pages/whats-new.adoc b/modules/get-started/pages/whats-new.adoc
@@ -7,6 +7,13 @@ This topic includes new content added in version {page-component-version} Beta.
 * xref:redpanda-cloud:get-started:whats-new-cloud.adoc[]
 * xref:redpanda-cloud:get-started:cloud-overview.adoc#redpanda-cloud-vs-self-managed-feature-compatibility[Redpanda Cloud vs Self-Managed feature compatibility]
 
+== New health probes for broker restarts and upgrades
+
+The Redpanda Admin API now includes new health probes to help you ensure safe broker restarts and upgrades. The xref:api:ROOT:admin-api.adoc#get-/v1/broker/pre_restart_probe[`pre_restart_probe`] endpoint identifies potential risks if a broker is restarted, and xref:api:ROOT:admin-api.adoc#get-/v1/broker/post_restart_probe[`post_restart_probe`] indicates how much of its workloads a broker has reclaimed after the restart. See also: 
+
+* xref:manage:cluster-maintenance/rolling-restart.adoc[]
+* xref:upgrade:rolling-upgrade.adoc[]
+
 == Redpanda Console v3.0.0 (beta)
 
 The Redpanda Console v3.0.0 beta release includes the following updates:
diff --git a/modules/upgrade/partials/rolling-upgrades/enable-maintenance-mode.adoc b/modules/upgrade/partials/rolling-upgrades/enable-maintenance-mode.adoc
@@ -10,7 +10,7 @@ rpk cluster health
 .Example output:
 [%collapsible]
 ====
-[.no-copy]
+[,bash,role=no-copy]
 ----
 CLUSTER HEALTH OVERVIEW
 =======================
@@ -19,12 +19,40 @@ Controller ID:               0
 All nodes:                   [0 1 2] <2>
 Nodes down:                  [] <3>
 Leaderless partitions:       [] <3>
-Under-replicated partitions: [] <3>
+Under-replicated partitions: [1] <3>
 ----
 <1> The cluster is either healthy (`true`) or unhealthy (`false`).
 <2> The node IDs of all brokers in the cluster.
 <3> If the cluster is unhealthy, these fields will contain data.
-====
+==== 
+
+. Optional: You can use the Admin API (default port: 9644) to perform additional checks for potential risks with restarting a specific broker.
++
+[,bash]
+----
+curl -X GET "http://<broker-address>:<admin-api-port>/v1/broker/pre_restart_probe" | jq .
+----
++
+.Example output:
+[,json,role=no-copy]
+----
+// Returns tuples of partitions (in the format {namespace}/{topic_name}/{partition_id}) affected by the broker restart.
+
+{
+  "risks": {
+    "rf1_offline": [
+      "kafka/topic_a/0"
+    ],
+    "full_acks_produce_unavailable": [],
+    "unavailable": [],
+    "acks1_data_loss": []
+  }
+}
+----
++
+In this example, the restart probe indicates that there is an under-replicated partition `kafka/topic_a/0` (with a replication factor of 1) at risk of going offline if the broker is restarted.
++
+See the xref:api:ROOT:admin-api.adoc#get-/v1/broker/pre_restart_probe[Admin API reference] for more details on the restart probe endpoint.
 
 ifdef::rolling-upgrade[. Select a broker that has not been upgraded yet and place it into maintenance mode:]
 ifdef::rolling-restart[. Select a broker and place it into maintenance mode:]
diff --git a/modules/upgrade/partials/rolling-upgrades/post-upgrade-tasks.adoc b/modules/upgrade/partials/rolling-upgrades/post-upgrade-tasks.adoc
@@ -11,4 +11,19 @@ To view additional information about your brokers, run:
 
 ```bash
 rpk redpanda admin brokers list
-```
+```
+
+You can also use the xref:api:ROOT:admin-api.adoc#get-/v1/broker/post_restart_probe[Admin API] to check how much each broker has progressed in recovering its workloads:
+
+```bash
+curl -X GET "http://<broker-address>:<admin-api-port>/v1/broker/post_restart_probe"
+```
+
+.Example output:
+[,json,role=no-copy]
+----
+// Returns the load already reclaimed by broker, as a percentage of in-sync replicas
+{
+    "load_reclaimed_pc": 66
+}
+----