Skip to content

Commit bd71e78

Browse files
divincodeVinay Devadiga
andauthored
feat(rolling-update): add hyperpod-patching subchart with RBAC for pod eviction (#91)
Co-authored-by: Vinay Devadiga <[email protected]>
1 parent 84963ab commit bd71e78

File tree

7 files changed

+37
-2
lines changed

7 files changed

+37
-2
lines changed

helm_chart/HyperPodHelmChart/Chart.yaml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,4 +74,8 @@ dependencies:
7474
- name: team-role-and-bindings
7575
version: "0.1.0"
7676
repository: "file://charts/team-role-and-bindings"
77-
condition: team-role-and-bindings.enabled
77+
condition: team-role-and-bindings.enabled
78+
- name: hyperpod-patching
79+
version: "0.1.0"
80+
repository: "file://charts/hyperpod-patching"
81+
condition: hyperpod-patching.enabled
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
apiVersion: v2
2+
name: hyperpod-patching
3+
description: A subchart for RBAC used by HyperPod patching workflows
4+
version: 0.1.0
5+
appVersion: "1.0"
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
apiVersion: rbac.authorization.k8s.io/v1
2+
kind: ClusterRole
3+
metadata:
4+
name: hyperpod-patching
5+
rules:
6+
- apiGroups: [""]
7+
resources: ["pods"]
8+
verbs: ["list"]
9+
- apiGroups: [""]
10+
resources: ["pods/eviction"]
11+
verbs: ["create"]
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: rbac.authorization.k8s.io/v1
2+
kind: ClusterRoleBinding
3+
metadata:
4+
name: hyperpod-patching
5+
subjects:
6+
- kind: User
7+
name: hyperpod-service-linked-role
8+
apiGroup: rbac.authorization.k8s.io
9+
roleRef:
10+
kind: ClusterRole
11+
name: hyperpod-patching
12+
apiGroup: rbac.authorization.k8s.io

helm_chart/HyperPodHelmChart/charts/hyperpod-patching/values.yaml

Whitespace-only changes.

helm_chart/HyperPodHelmChart/values.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -259,4 +259,6 @@ health-monitoring-agent:
259259
deep-health-check:
260260
enabled: true
261261
job-auto-restart:
262-
enabled: true
262+
enabled: true
263+
hyperpod-patching:
264+
enabled: true

helm_chart/readme.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ chmod 700 get_helm.sh
2626
| neuron-device-plugin | Deploys the AWS Neuron device plugin for Kubernetes, enabling support for AWS Inferentia chips to accelerate machine learning model inference workloads. | Yes |
2727
| storage | Manages persistent storage resources for Kubernetes applications, ensuring that data is retained and accessible across pod restarts and cluster upgrades. | No |
2828
| training-operators | Installs operators for managing various machine learning training jobs, such as TensorFlow, PyTorch, and MXNet, providing native Kubernetes support for distributed training workloads. | Yes |
29+
| HyperPod patching | Deploys the RBAC and controller resources needed for orchestrating rolling updates and patching workflows in SageMaker HyperPod clusters. Includes pod eviction and node monitoring. | Yes |
2930

3031
## 3. Test the Chart Locally
3132

0 commit comments

Comments
 (0)