Add user guide for RestartAllContainers

daisy-ycguo · daisy-ycguo · commit e177cb2d25e4 · 2026-03-17T14:48:03.000+08:00
Signed-off-by: Daisy Guo &lt;daiguo@nvidia.com&gt;
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/Dockerfile b/docs/user-guide/04_restart-all-containers-for-podcliqueset/Dockerfile
@@ -0,0 +1,11 @@
+# Build stage
+FROM golang:1.21-alpine AS builder
+WORKDIR /app
+COPY src/ ./
+RUN go mod tidy && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o /restart-watcher .
+
+# Runtime stage
+FROM alpine:3.19
+RUN apk --no-cache add ca-certificates
+COPY --from=builder /restart-watcher /restart-watcher
+ENTRYPOINT ["/restart-watcher"]
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/Makefile b/docs/user-guide/04_restart-all-containers-for-podcliqueset/Makefile
@@ -0,0 +1,28 @@
+# restart-watcher image build and push
+# Override for your registry: make push REGISTRY=myreg.io/myuser
+REGISTRY    ?= localhost:5000
+IMAGE_NAME  ?= restart-watcher
+IMAGE_TAG   ?= latest
+FULL_IMAGE  := $(REGISTRY)/$(IMAGE_NAME):$(IMAGE_TAG)
+
+.PHONY: build image push all tidy
+
+# Build Go binary locally (for development)
+build:
+	cd src && go build -o ../bin/restart-watcher .
+
+# Build Docker image (binary built inside container)
+image:
+	docker build -t $(FULL_IMAGE) -f Dockerfile .
+	@echo "Built $(FULL_IMAGE)"
+
+# Push image to registry
+push: image
+	docker push $(FULL_IMAGE)
+	@echo "Pushed $(FULL_IMAGE)"
+
+# Tidy go.mod (run from repo root)
+tidy:
+	cd src && go mod tidy
+
+all: image
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/README.md b/docs/user-guide/04_restart-all-containers-for-podcliqueset/README.md
@@ -0,0 +1,154 @@
+# PodCliqueSet In-Place Restart Guide
+
+This guide uses Kubernetes 1.35+ **RestartAllContainers** to trigger in-place restarts of all Pods in a Grove **PodCliqueSet** via a single **ConfigMap** field `restartGeneration`. Pod names, UIDs, and IPs stay the same (no rescheduling).
+
+## Use case: restart without rescheduling
+
+Sometimes we want a **PodCliqueSet** (or all its Pods) to restart **without going through rescheduling**—for example when **upgrading container image versions** or when we need a clean re-run of init containers and main containers while keeping the same Pod identity and placement.
+
+Deleting and recreating Pods is costly: it involves the scheduler, node allocation, and re-initialization of networking and storage. Kubernetes 1.35’s [Restart All Containers](https://kubernetes.io/blog/2026/01/02/kubernetes-v1-35-restart-all-containers/) feature provides an **in-place** restart instead: the kubelet restarts all containers in the Pod while preserving the Pod’s UID, IP address, volumes, and node assignment. Init containers run again in order, then all main containers start with a fresh state—so an image update or configuration change can take effect without any rescheduling. This guide shows how to trigger that in-place restart for an entire Grove PodCliqueSet at once via a ConfigMap.
+
+### Limitations
+
+This guide applies only to **restarting PodCliqueSets** (in-place restart of all Pods belonging to a Grove PodCliqueSet). It does not cover other workload types or cluster-wide restart scenarios.
+
+## Idea
+
+- Each Pod runs a **restart-watcher** sidecar (Go). It uses in-cluster config to poll the ConfigMap `grove-restart-control` in the same namespace for the key `restartGeneration`.
+- When it sees `restartGeneration` **increase**, the watcher exits with a configured code (default 88), which triggers **RestartAllContainers** for that Pod.
+- To trigger a batch in-place restart, **kubectl patch** the ConfigMap to increment `restartGeneration`; all Pods with the watcher will see the new value on the next poll and restart in place.
+
+## Directory layout
+
+```
+04_restart-all-containers-for-podcliqueset/
+├── src/
+│   ├── main.go       # restart-watcher sidecar source
+│   └── go.mod
+├── Dockerfile        # build watcher image
+├── Makefile          # build and push image
+├── manifests/
+│   ├── namespace.yaml
+│   ├── rbac.yaml     # SA + Role + RoleBinding (read ConfigMap)
+│   ├── configmap.yaml
+│   └── podcliqueset.yaml
+└── README.md
+```
+
+## Prerequisites
+
+1. **Cluster**: Kubernetes **1.35+** with **RestartAllContainersOnContainerExits** and **NodeDeclaredFeatures** enabled. Both feature gates must be enabled on **both** the API server and the kubelet. **RestartAllContainersOnContainerExits** depends on **NodeDeclaredFeatures**, so enable them together. See your cluster or distribution docs for how to set feature gates.
+2. **Grove**: CRD and Operator installed, **v0.1.0-alpha.4 or later**.
+3. **Registry**: A Docker registry you can push the `restart-watcher` image to and that cluster nodes can pull from.
+
+## Steps
+
+### 1. Build and push the restart-watcher image
+
+From this guide's directory:
+
+```bash
+# Set your registry (required)
+export REGISTRY=your-registry.io/your-user
+export IMAGE_TAG=latest
+
+make push
+```
+
+Note the image name, e.g. `$(REGISTRY)/restart-watcher:$(IMAGE_TAG)`.
+
+### 2. Set the watcher image in the PodCliqueSet
+
+Edit `manifests/podcliqueset.yaml` and replace both `WATCHER_IMAGE` with the image you pushed, e.g.:
+
+```bash
+sed -i "s|WATCHER_IMAGE|${REGISTRY}/restart-watcher:${IMAGE_TAG}|g" manifests/podcliqueset.yaml
+```
+
+Or change `image: WATCHER_IMAGE` to e.g. `image: your-registry.io/your-user/restart-watcher:latest` by hand.
+
+### 3. Deploy PodCliqueSet and ConfigMap
+
+```bash
+kubectl apply -f manifests/namespace.yaml
+kubectl apply -f manifests/rbac.yaml
+kubectl apply -f manifests/configmap.yaml
+kubectl apply -f manifests/podcliqueset.yaml
+```
+
+The [example PodCliqueSet](manifests/podcliqueset.yaml) has two PodCliques, **pca** and **pcb**, and both include the **restart-watcher** sidecar so that incrementing `restartGeneration` restarts all 6 Pods. If you only want to restart one PodClique, add the restart-watcher sidecar only to that clique’s `podSpec` in the manifest; Pods without the sidecar will not react to the ConfigMap.
+
+### 4. Wait for Pods to be ready
+
+The PodCliqueSet has **pca** (replicas=2) and **pcb** (replicas=4), 6 Pods in total:
+
+```bash
+kubectl get podcliqueset -n grove-restart-demo
+kubectl get pods -n grove-restart-demo -l app.kubernetes.io/part-of=grove-restart-demo-pcs -o wide
+```
+
+Confirm all 6 Pods are `Running`.
+
+### 5. (Optional) Record Pod name, Pod ID, and IP
+
+To compare before and after the trigger (names, UIDs, and IPs should stay the same):
+
+```bash
+kubectl get pods -n grove-restart-demo -l app.kubernetes.io/part-of=grove-restart-demo-pcs \
+  -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\t"}{.status.podIP}{"\n"}{end}'
+```
+
+### 6. Trigger one “full PodCliqueSet in-place restart”
+
+Increment the ConfigMap `grove-restart-control` key `restartGeneration`:
+
+```bash
+# Get current value
+current=$(kubectl get configmap grove-restart-control -n grove-restart-demo -o jsonpath='{.data.restartGeneration}')
+next=$((current + 1))
+
+# Patch with next value
+kubectl patch configmap grove-restart-control -n grove-restart-demo \
+  --type merge \
+  -p "{\"data\":{\"restartGeneration\":\"${next}\"}}"
+```
+
+Within the next poll interval (default 5 seconds), all Pods with the restart-watcher will see the new value, exit with 88, and trigger RestartAllContainers for their Pod.
+
+### 7. Observe
+
+After about 10–20 seconds:
+
+```bash
+kubectl get pods -n grove-restart-demo -l app.kubernetes.io/part-of=grove-restart-demo-pcs -o wide
+```
+
+**Expected**:
+
+- Pod **names, UIDs and IPs unchanged** (no new Pods created or deleted).
+- **Restart counts** increased (e.g. `kubectl get pod <name> -n grove-restart-demo -o jsonpath='{range .status.containerStatuses[*]}{.name} restarts={.restartCount}{"\n"}{end}'`).
+
+To trigger again, repeat step 6 (increment `restartGeneration` again).
+
+### 8. Cleanup
+
+```bash
+kubectl delete -f manifests/podcliqueset.yaml
+kubectl delete -f manifests/configmap.yaml
+kubectl delete -f manifests/rbac.yaml
+kubectl delete -f manifests/namespace.yaml
+```
+
+## Environment variables (restart-watcher)
+
+| Variable | Meaning | Default |
+|----------|---------|--------|
+| `CM_NAMESPACE` | ConfigMap namespace | Prefer `metadata.namespace` via fieldRef |
+| `CM_NAME` | ConfigMap name | `grove-restart-control` |
+| `KEY_NAME` | Key name | `restartGeneration` |
+| `POLL_INTERVAL_SECONDS` | Poll interval (seconds) | `5` |
+| `TRIGGER_EXIT_CODE` | Exit code that triggers RestartAllContainers | `88` |
+
+## References
+
+- Kubernetes 1.35: [Restart All Containers](https://kubernetes.io/blog/2026/01/02/kubernetes-v1-35-restart-all-containers/), [KEP-5532](https://kep.k8s.io/5532)
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/configmap.yaml b/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/configmap.yaml
@@ -0,0 +1,10 @@
+# ConfigMap watched by restart-watcher sidecars.
+# Increment data.restartGeneration to trigger in-place RestartAllContainers on all Pods
+# that have the watcher sidecar.
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: grove-restart-control
+  namespace: grove-restart-demo
+data:
+  restartGeneration: "0"
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/namespace.yaml b/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/namespace.yaml
@@ -0,0 +1,6 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: grove-restart-demo
+  labels:
+    app.kubernetes.io/name: grove-restart-demo
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/podcliqueset.yaml b/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/podcliqueset.yaml
@@ -0,0 +1,96 @@
+# Grove PodCliqueSet: pca (nginx:1.25, replicas=2), pcb (nginx:1.25, replicas=4).
+# Each Pod has a restart-watcher sidecar that watches ConfigMap grove-restart-control.
+# When restartGeneration is incremented, watchers exit with code 88 and trigger
+# RestartAllContainers (K8s 1.35+). Replace WATCHER_IMAGE with your built image.
+#
+# Prerequisites:
+# - Kubernetes 1.35+ with RestartAllContainersOnContainerExits and NodeDeclaredFeatures
+# - Grove CRD + Operator installed
+---
+apiVersion: grove.io/v1alpha1
+kind: PodCliqueSet
+metadata:
+  name: grove-restart-demo-pcs
+  namespace: grove-restart-demo
+  labels:
+    app: grove-restart-demo
+spec:
+  replicas: 1
+  template:
+    terminationDelay: 1m
+    cliqueStartupType: CliqueStartupTypeExplicit
+    cliques:
+      - name: pca
+        spec:
+          roleName: rolea
+          replicas: 2
+          podSpec:
+            restartPolicy: Always
+            serviceAccountName: grove-restart-demo
+            containers:
+              - name: main
+                image: nginx:1.25
+                ports:
+                  - containerPort: 80
+                resources:
+                  requests:
+                    cpu: 10m
+              - name: restart-watcher
+                image: WATCHER_IMAGE
+                imagePullPolicy: IfNotPresent
+                env:
+                  - name: CM_NAMESPACE
+                    valueFrom:
+                      fieldRef:
+                        fieldPath: metadata.namespace
+                  - name: CM_NAME
+                    value: "grove-restart-control"
+                  - name: KEY_NAME
+                    value: "restartGeneration"
+                  - name: POLL_INTERVAL_SECONDS
+                    value: "5"
+                  - name: TRIGGER_EXIT_CODE
+                    value: "88"
+                restartPolicy: Always
+                restartPolicyRules:
+                  - action: RestartAllContainers
+                    exitCodes:
+                      operator: In
+                      values: [88]
+      - name: pcb
+        spec:
+          roleName: roleb
+          replicas: 4
+          podSpec:
+            restartPolicy: Always
+            serviceAccountName: grove-restart-demo
+            containers:
+              - name: main
+                image: nginx:1.25
+                ports:
+                  - containerPort: 80
+                resources:
+                  requests:
+                    cpu: 10m
+              - name: restart-watcher
+                image: WATCHER_IMAGE
+                imagePullPolicy: IfNotPresent
+                env:
+                  - name: CM_NAMESPACE
+                    valueFrom:
+                      fieldRef:
+                        fieldPath: metadata.namespace
+                  - name: CM_NAME
+                    value: "grove-restart-control"
+                  - name: KEY_NAME
+                    value: "restartGeneration"
+                  - name: POLL_INTERVAL_SECONDS
+                    value: "5"
+                  - name: TRIGGER_EXIT_CODE
+                    value: "88"
+                restartPolicy: Always
+                restartPolicyRules:
+                  - action: RestartAllContainers
+                    exitCodes:
+                      operator: In
+                      values: [88]
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/rbac.yaml b/docs/user-guide/04_restart-all-containers-for-podcliqueset/manifests/rbac.yaml
@@ -0,0 +1,33 @@
+# ServiceAccount and RBAC for Pods that run the restart-watcher sidecar.
+# The watcher needs to read the ConfigMap "grove-restart-control" in the same namespace.
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: grove-restart-demo
+  namespace: grove-restart-demo
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: grove-restart-configmap-reader
+  namespace: grove-restart-demo
+rules:
+  - apiGroups: [""]
+    resources: ["configmaps"]
+    resourceNames: ["grove-restart-control"]
+    verbs: ["get", "list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: grove-restart-demo-read-configmap
+  namespace: grove-restart-demo
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: grove-restart-configmap-reader
+subjects:
+  - kind: ServiceAccount
+    name: grove-restart-demo
+    namespace: grove-restart-demo
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/src/go.mod b/docs/user-guide/04_restart-all-containers-for-podcliqueset/src/go.mod
@@ -0,0 +1,47 @@
+module github.com/daisy/restart-watcher
+
+go 1.21
+
+require (
+	k8s.io/api v0.28.4
+	k8s.io/apimachinery v0.28.4
+	k8s.io/client-go v0.28.4
+)
+
+require (
+	github.com/davecgh/go-spew v1.1.1 // indirect
+	github.com/emicklei/go-restful/v3 v3.9.0 // indirect
+	github.com/go-logr/logr v1.2.4 // indirect
+	github.com/go-openapi/jsonpointer v0.19.6 // indirect
+	github.com/go-openapi/jsonreference v0.20.2 // indirect
+	github.com/go-openapi/swag v0.22.3 // indirect
+	github.com/gogo/protobuf v1.3.2 // indirect
+	github.com/golang/protobuf v1.5.3 // indirect
+	github.com/google/gnostic-models v0.6.8 // indirect
+	github.com/google/go-cmp v0.5.9 // indirect
+	github.com/google/gofuzz v1.2.0 // indirect
+	github.com/google/uuid v1.3.0 // indirect
+	github.com/josharian/intern v1.0.0 // indirect
+	github.com/json-iterator/go v1.1.12 // indirect
+	github.com/mailru/easyjson v0.7.7 // indirect
+	github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
+	github.com/modern-go/reflect2 v1.0.2 // indirect
+	github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
+	golang.org/x/net v0.17.0 // indirect
+	golang.org/x/oauth2 v0.8.0 // indirect
+	golang.org/x/sys v0.13.0 // indirect
+	golang.org/x/term v0.13.0 // indirect
+	golang.org/x/text v0.13.0 // indirect
+	golang.org/x/time v0.3.0 // indirect
+	google.golang.org/appengine v1.6.7 // indirect
+	google.golang.org/protobuf v1.31.0 // indirect
+	gopkg.in/inf.v0 v0.9.1 // indirect
+	gopkg.in/yaml.v2 v2.4.0 // indirect
+	gopkg.in/yaml.v3 v3.0.1 // indirect
+	k8s.io/klog/v2 v2.100.1 // indirect
+	k8s.io/kube-openapi v0.0.0-20230717233707-2695361300d9 // indirect
+	k8s.io/utils v0.0.0-20230406110748-d93618cff8a2 // indirect
+	sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
+	sigs.k8s.io/structured-merge-diff/v4 v4.2.3 // indirect
+	sigs.k8s.io/yaml v1.3.0 // indirect
+)
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/src/go.sum b/docs/user-guide/04_restart-all-containers-for-podcliqueset/src/go.sum
diff --git a/docs/user-guide/04_restart-all-containers-for-podcliqueset/src/main.go b/docs/user-guide/04_restart-all-containers-for-podcliqueset/src/main.go