Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Build stage
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY src/ ./
RUN go mod tidy && CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o /restart-watcher .

# Runtime stage
FROM alpine:3.19
RUN apk --no-cache add ca-certificates
COPY --from=builder /restart-watcher /restart-watcher
ENTRYPOINT ["/restart-watcher"]
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# restart-watcher image build and push
# Override for your registry: make push REGISTRY=myreg.io/myuser
REGISTRY ?= localhost:5000
IMAGE_NAME ?= restart-watcher
IMAGE_TAG ?= latest
FULL_IMAGE := $(REGISTRY)/$(IMAGE_NAME):$(IMAGE_TAG)

.PHONY: build image push all tidy

# Build Go binary locally (for development)
build:
cd src && go build -o ../bin/restart-watcher .

# Build Docker image (binary built inside container)
image:
docker build -t $(FULL_IMAGE) -f Dockerfile .
@echo "Built $(FULL_IMAGE)"

# Push image to registry
push: image
docker push $(FULL_IMAGE)
@echo "Pushed $(FULL_IMAGE)"

# Tidy go.mod (run from repo root)
tidy:
cd src && go mod tidy

all: image
154 changes: 154 additions & 0 deletions docs/user-guide/04_restart-all-containers-for-podcliqueset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# PodCliqueSet In-Place Restart Guide

This guide uses Kubernetes 1.35+ **RestartAllContainers** to trigger in-place restarts of all Pods in a Grove **PodCliqueSet** via a single **ConfigMap** field `restartGeneration`. Pod names, UIDs, and IPs stay the same (no rescheduling).

## Use case: restart without rescheduling

Sometimes we want a **PodCliqueSet** (or all its Pods) to restart **without going through rescheduling**—for example when **upgrading container image versions** or when we need a clean re-run of init containers and main containers while keeping the same Pod identity and placement.

Deleting and recreating Pods is costly: it involves the scheduler, node allocation, and re-initialization of networking and storage. Kubernetes 1.35’s [Restart All Containers](https://kubernetes.io/blog/2026/01/02/kubernetes-v1-35-restart-all-containers/) feature provides an **in-place** restart instead: the kubelet restarts all containers in the Pod while preserving the Pod’s UID, IP address, volumes, and node assignment. Init containers run again in order, then all main containers start with a fresh state—so an image update or configuration change can take effect without any rescheduling. This guide shows how to trigger that in-place restart for an entire Grove PodCliqueSet at once via a ConfigMap.

### Limitations

This guide applies only to **restarting PodCliqueSets** (in-place restart of all Pods belonging to a Grove PodCliqueSet). It does not cover other workload types or cluster-wide restart scenarios.

## Idea

- Each Pod runs a **restart-watcher** sidecar (Go). It uses in-cluster config to poll the ConfigMap `grove-restart-control` in the same namespace for the key `restartGeneration`.
- When it sees `restartGeneration` **increase**, the watcher exits with a configured code (default 88), which triggers **RestartAllContainers** for that Pod.
- To trigger a batch in-place restart, **kubectl patch** the ConfigMap to increment `restartGeneration`; all Pods with the watcher will see the new value on the next poll and restart in place.

## Directory layout

```
04_restart-all-containers-for-podcliqueset/
├── src/
│ ├── main.go # restart-watcher sidecar source
│ └── go.mod
├── Dockerfile # build watcher image
├── Makefile # build and push image
├── manifests/
│ ├── namespace.yaml
│ ├── rbac.yaml # SA + Role + RoleBinding (read ConfigMap)
│ ├── configmap.yaml
│ └── podcliqueset.yaml
└── README.md
```

## Prerequisites

1. **Cluster**: Kubernetes **1.35+** with **RestartAllContainersOnContainerExits** and **NodeDeclaredFeatures** enabled. Both feature gates must be enabled on **both** the API server and the kubelet. **RestartAllContainersOnContainerExits** depends on **NodeDeclaredFeatures**, so enable them together. See your cluster or distribution docs for how to set feature gates.
2. **Grove**: CRD and Operator installed, **v0.1.0-alpha.4 or later**.
3. **Registry**: A Docker registry you can push the `restart-watcher` image to and that cluster nodes can pull from.

## Steps

### 1. Build and push the restart-watcher image

From this guide's directory:

```bash
# Set your registry (required)
export REGISTRY=your-registry.io/your-user
export IMAGE_TAG=latest

make push
```

Note the image name, e.g. `$(REGISTRY)/restart-watcher:$(IMAGE_TAG)`.

### 2. Set the watcher image in the PodCliqueSet

Edit `manifests/podcliqueset.yaml` and replace both `WATCHER_IMAGE` with the image you pushed, e.g.:

```bash
sed -i "s|WATCHER_IMAGE|${REGISTRY}/restart-watcher:${IMAGE_TAG}|g" manifests/podcliqueset.yaml
```

Or change `image: WATCHER_IMAGE` to e.g. `image: your-registry.io/your-user/restart-watcher:latest` by hand.

### 3. Deploy PodCliqueSet and ConfigMap

```bash
kubectl apply -f manifests/namespace.yaml
kubectl apply -f manifests/rbac.yaml
kubectl apply -f manifests/configmap.yaml
kubectl apply -f manifests/podcliqueset.yaml
```

The [example PodCliqueSet](manifests/podcliqueset.yaml) has two PodCliques, **pca** and **pcb**, and both include the **restart-watcher** sidecar so that incrementing `restartGeneration` restarts all 6 Pods. If you only want to restart one PodClique, add the restart-watcher sidecar only to that clique’s `podSpec` in the manifest; Pods without the sidecar will not react to the ConfigMap.

### 4. Wait for Pods to be ready

The PodCliqueSet has **pca** (replicas=2) and **pcb** (replicas=4), 6 Pods in total:

```bash
kubectl get podcliqueset -n grove-restart-demo
kubectl get pods -n grove-restart-demo -l app.kubernetes.io/part-of=grove-restart-demo-pcs -o wide
```

Confirm all 6 Pods are `Running`.

### 5. (Optional) Record Pod name, Pod ID, and IP

To compare before and after the trigger (names, UIDs, and IPs should stay the same):

```bash
kubectl get pods -n grove-restart-demo -l app.kubernetes.io/part-of=grove-restart-demo-pcs \
-o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.metadata.uid}{"\t"}{.status.podIP}{"\n"}{end}'
```

### 6. Trigger one “full PodCliqueSet in-place restart”

Increment the ConfigMap `grove-restart-control` key `restartGeneration`:

```bash
# Get current value
current=$(kubectl get configmap grove-restart-control -n grove-restart-demo -o jsonpath='{.data.restartGeneration}')
next=$((current + 1))

# Patch with next value
kubectl patch configmap grove-restart-control -n grove-restart-demo \
--type merge \
-p "{\"data\":{\"restartGeneration\":\"${next}\"}}"
```

Within the next poll interval (default 5 seconds), all Pods with the restart-watcher will see the new value, exit with 88, and trigger RestartAllContainers for their Pod.

### 7. Observe

After about 10–20 seconds:

```bash
kubectl get pods -n grove-restart-demo -l app.kubernetes.io/part-of=grove-restart-demo-pcs -o wide
```

**Expected**:

- Pod **names, UIDs and IPs unchanged** (no new Pods created or deleted).
- **Restart counts** increased (e.g. `kubectl get pod <name> -n grove-restart-demo -o jsonpath='{range .status.containerStatuses[*]}{.name} restarts={.restartCount}{"\n"}{end}'`).

To trigger again, repeat step 6 (increment `restartGeneration` again).

### 8. Cleanup

```bash
kubectl delete -f manifests/podcliqueset.yaml
kubectl delete -f manifests/configmap.yaml
kubectl delete -f manifests/rbac.yaml
kubectl delete -f manifests/namespace.yaml
```

## Environment variables (restart-watcher)

| Variable | Meaning | Default |
|----------|---------|--------|
| `CM_NAMESPACE` | ConfigMap namespace | Prefer `metadata.namespace` via fieldRef |
| `CM_NAME` | ConfigMap name | `grove-restart-control` |
| `KEY_NAME` | Key name | `restartGeneration` |
| `POLL_INTERVAL_SECONDS` | Poll interval (seconds) | `5` |
| `TRIGGER_EXIT_CODE` | Exit code that triggers RestartAllContainers | `88` |

## References

- Kubernetes 1.35: [Restart All Containers](https://kubernetes.io/blog/2026/01/02/kubernetes-v1-35-restart-all-containers/), [KEP-5532](https://kep.k8s.io/5532)
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# ConfigMap watched by restart-watcher sidecars.
# Increment data.restartGeneration to trigger in-place RestartAllContainers on all Pods
# that have the watcher sidecar.
apiVersion: v1
kind: ConfigMap
metadata:
name: grove-restart-control
namespace: grove-restart-demo
data:
restartGeneration: "0"
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
apiVersion: v1
kind: Namespace
metadata:
name: grove-restart-demo
labels:
app.kubernetes.io/name: grove-restart-demo
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Grove PodCliqueSet: pca (nginx:1.25, replicas=2), pcb (nginx:1.25, replicas=4).
# Each Pod has a restart-watcher sidecar that watches ConfigMap grove-restart-control.
# When restartGeneration is incremented, watchers exit with code 88 and trigger
# RestartAllContainers (K8s 1.35+). Replace WATCHER_IMAGE with your built image.
#
# Prerequisites:
# - Kubernetes 1.35+ with RestartAllContainersOnContainerExits and NodeDeclaredFeatures
# - Grove CRD + Operator installed
---
apiVersion: grove.io/v1alpha1
kind: PodCliqueSet
metadata:
name: grove-restart-demo-pcs
namespace: grove-restart-demo
labels:
app: grove-restart-demo
spec:
replicas: 1
template:
terminationDelay: 1m
cliqueStartupType: CliqueStartupTypeExplicit
cliques:
- name: pca
spec:
roleName: rolea
replicas: 2
podSpec:
restartPolicy: Always
serviceAccountName: grove-restart-demo
containers:
- name: main
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: 10m
- name: restart-watcher
image: WATCHER_IMAGE
imagePullPolicy: IfNotPresent
env:
- name: CM_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CM_NAME
value: "grove-restart-control"
- name: KEY_NAME
value: "restartGeneration"
- name: POLL_INTERVAL_SECONDS
value: "5"
- name: TRIGGER_EXIT_CODE
value: "88"
restartPolicy: Always
restartPolicyRules:
- action: RestartAllContainers
exitCodes:
operator: In
values: [88]
- name: pcb
spec:
roleName: roleb
replicas: 4
podSpec:
restartPolicy: Always
serviceAccountName: grove-restart-demo
containers:
- name: main
image: nginx:1.25
ports:
- containerPort: 80
resources:
requests:
cpu: 10m
- name: restart-watcher
image: WATCHER_IMAGE
imagePullPolicy: IfNotPresent
env:
- name: CM_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CM_NAME
value: "grove-restart-control"
- name: KEY_NAME
value: "restartGeneration"
- name: POLL_INTERVAL_SECONDS
value: "5"
- name: TRIGGER_EXIT_CODE
value: "88"
restartPolicy: Always
restartPolicyRules:
- action: RestartAllContainers
exitCodes:
operator: In
values: [88]
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# ServiceAccount and RBAC for Pods that run the restart-watcher sidecar.
# The watcher needs to read the ConfigMap "grove-restart-control" in the same namespace.
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: grove-restart-demo
namespace: grove-restart-demo
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: grove-restart-configmap-reader
namespace: grove-restart-demo
rules:
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["grove-restart-control"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: grove-restart-demo-read-configmap
namespace: grove-restart-demo
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: grove-restart-configmap-reader
subjects:
- kind: ServiceAccount
name: grove-restart-demo
namespace: grove-restart-demo
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
module github.com/daisy/restart-watcher

go 1.21

require (
k8s.io/api v0.28.4
k8s.io/apimachinery v0.28.4
k8s.io/client-go v0.28.4
)

require (
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/emicklei/go-restful/v3 v3.9.0 // indirect
github.com/go-logr/logr v1.2.4 // indirect
github.com/go-openapi/jsonpointer v0.19.6 // indirect
github.com/go-openapi/jsonreference v0.20.2 // indirect
github.com/go-openapi/swag v0.22.3 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/gnostic-models v0.6.8 // indirect
github.com/google/go-cmp v0.5.9 // indirect
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/uuid v1.3.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
golang.org/x/net v0.17.0 // indirect
golang.org/x/oauth2 v0.8.0 // indirect
golang.org/x/sys v0.13.0 // indirect
golang.org/x/term v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/time v0.3.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/protobuf v1.31.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/klog/v2 v2.100.1 // indirect
k8s.io/kube-openapi v0.0.0-20230717233707-2695361300d9 // indirect
k8s.io/utils v0.0.0-20230406110748-d93618cff8a2 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.2.3 // indirect
sigs.k8s.io/yaml v1.3.0 // indirect
)
Loading
Loading