|
36 | 36 | brew install jq moreutils gettext
|
37 | 37 | ```
|
38 | 38 |
|
39 |
| -### Helm Prerequisites |
40 |
| - |
41 |
| -This acrticle includes instructions for using Help to deploy the GPU operator. If you plan to use Helm, you will need do the following: |
42 |
| - |
43 |
| -1. Add the MOBB chart repository to your Helm using the following command: |
44 |
| - |
45 |
| - ```bash |
46 |
| - helm repo add mobb https://rh-mobb.github.io/helm-charts/ |
47 |
| - ``` |
48 |
| - |
49 |
| -1. Update your repositories using the following command: |
50 |
| - |
51 |
| - ```bash |
52 |
| - helm repo update |
53 |
| - ``` |
54 |
| - |
55 | 39 | ## Request GPU quota
|
56 | 40 |
|
57 | 41 | All GPU quotas in Azure are 0 by default. You will need to sign in to the Azure portal and request GPU quota. Since there is a lot of competition for GPU workers, you may have to provision an ARO cluster in a region where you can actually reserve GPU.
|
@@ -91,46 +75,6 @@ Update your pull secret to make sure you can install operators and connect to [c
|
91 | 75 | > [!NOTE]
|
92 | 76 | > Skip this step if you have already recreated a full pull secret with cloud.redhat.com enabled.
|
93 | 77 |
|
94 |
| -### Using Helm |
95 |
| - |
96 |
| -1. Before deploying the Helm chart, you must adopt the existing pull secret: |
97 |
| - |
98 |
| - ```bash |
99 |
| - kubectl -n openshift-config annotate secret \ |
100 |
| - pull-secret meta.helm.sh/release-name=pull-secret |
101 |
| - kubectl -n openshift-config annotate secret \ |
102 |
| - pull-secret meta.helm.sh/release-namespace=openshift-config |
103 |
| - kubectl -n openshift-config label secret \ |
104 |
| - pull-secret app.kubernetes.io/managed-by=Helm |
105 |
| - ``` |
106 |
| - |
107 |
| -1. Download your new pull secret from **https://console.redhat.com/openshift/downloads -> Tokens -> Pull secret**. |
108 |
| - |
109 |
| -1. Use the new pull secret to update/create the pull secret in your cluster. This chart will merge the in-cluster pull secret with the new pull secret: |
110 |
| - |
111 |
| - ``` |
112 |
| - helm upgrade --install pull-secret mobb/aro-pull-secret \ |
113 |
| - -n openshift-config --set-file pullSecret=$HOME/Downloads/pull-secret.txt |
114 |
| - ``` |
115 |
| - |
116 |
| -1. Enable Operator Hub: |
117 |
| - |
118 |
| - ```bash |
119 |
| - oc patch configs.samples.operator.openshift.io cluster --type=merge \ |
120 |
| - -p='{"spec":{"managementState":"Managed"}}' |
121 |
| - oc patch operatorhub cluster --type=merge \ |
122 |
| - -p='{"spec":{"sources":[ |
123 |
| - {"name":"redhat-operators","disabled":false}, |
124 |
| - {"name":"certified-operators","disabled":false}, |
125 |
| - {"name":"community-operators","disabled":false}, |
126 |
| - {"name":"redhat-marketplace","disabled":false} |
127 |
| - ]}}' |
128 |
| - ``` |
129 |
| - |
130 |
| -1. Skip to [GPU Machine Set](#gpu-machine-set). |
131 |
| - |
132 |
| -### Manually |
133 |
| - |
134 | 78 | 1. Log into to [cloud.redhat.com](https://cloud.redhat.com/).
|
135 | 79 |
|
136 | 80 | 1. Browse to https://cloud.redhat.com/openshift/install/azure/aro-provisioned.
|
@@ -168,28 +112,7 @@ Update your pull secret to make sure you can install operators and connect to [c
|
168 | 112 |
|
169 | 113 | ## GPU Machine Set
|
170 | 114 |
|
171 |
| -ARO uses Kubernetes MachineSet to create machine sets. The procedures below explain how to export the first machine set in a cluster and use that as a template to build a single GPU machine. |
172 |
| - |
173 |
| -<!--I'm going to export the first machine set in my cluster (az 1) and use that as a template to build a single GPU machine in southcentralus region 1.--> |
174 |
| -
|
175 |
| -### Export using Helm |
176 |
| -
|
177 |
| -1. Create a new machine-set (replicas of 1). See the Chart's [values](https://github.com/rh-mobb/helm-charts/blob/main/charts/aro-gpu/values.yaml) file for configuration options. |
178 |
| - |
179 |
| - ``` |
180 |
| - helm upgrade --install -n openshift-machine-api \ |
181 |
| - gpu mobb/aro-gpu |
182 |
| - ``` |
183 |
| - |
184 |
| -1. Wait for the new GPU nodes to become available. |
185 |
| - |
186 |
| - ```bash |
187 |
| - watch oc get machines |
188 |
| - ``` |
189 |
| - |
190 |
| -1. Skip to [Install Nvidia GPU Operator](#install-nvidia-gpu-operator) |
191 |
| - |
192 |
| -### Export manually |
| 115 | +ARO uses Kubernetes MachineSet to create machine sets. The procedure below explains how to export the first machine set in a cluster and use that as a template to build a single GPU machine. |
193 | 116 |
|
194 | 117 | 1. View existing machine sets.
|
195 | 118 |
|
@@ -286,57 +209,6 @@ Use the following steps to create the new GPU machine. It may take 10-15 minutes
|
286 | 209 |
|
287 | 210 | This section explains how to create the `nvidia-gpu-operator` namespace, set up the operator group, and install the Nvidia GPU operator.
|
288 | 211 |
|
289 |
| -### Helm |
290 |
| -
|
291 |
| -1. Create namespaces. |
292 |
| -
|
293 |
| - ```bash |
294 |
| - oc create namespace openshift-nfd |
295 |
| - oc create namespace nvidia-gpu-operator |
296 |
| - ``` |
297 |
| -
|
298 |
| -1. Use the `mobb/operatorhub` chart to deploy the needed operators. |
299 |
| -
|
300 |
| - ```bash |
301 |
| - helm upgrade -n nvidia-gpu-operator nvidia-gpu-operator \ |
302 |
| - mobb/operatorhub --install \ |
303 |
| - --values https://raw.githubusercontent.com/rh-mobb/helm-charts/main/charts/nvidia-gpu/files/operatorhub.yaml |
304 |
| - ``` |
305 |
| -
|
306 |
| -1. Wait until the two operators are running. |
307 |
| -
|
308 |
| - ```bash |
309 |
| - watch kubectl get pods -n openshift-nfd |
310 |
| - ``` |
311 |
| -
|
312 |
| - ``` |
313 |
| - NAME READY STATUS RESTARTS AGE |
314 |
| - nfd-controller-manager-7b66c67bd9-rk98w 2/2 Running 0 47s |
315 |
| - ``` |
316 |
| -
|
317 |
| - ```bash |
318 |
| - watch oc get pods -n nvidia-gpu-operator |
319 |
| - ``` |
320 |
| -
|
321 |
| - ``` |
322 |
| - NAME READY STATUS RESTARTS AGE |
323 |
| - gpu-operator-5d8cb7dd5f-c4ljk 1/1 Running 0 87s |
324 |
| - ``` |
325 |
| -
|
326 |
| -1. Install the Nvidia GPU Operator chart. |
327 |
| -
|
328 |
| - ```bash |
329 |
| -
|
330 |
| - ```bash |
331 |
| - helm upgrade --install -n nvidia-gpu-operator nvidia-gpu \ |
332 |
| - mobb/nvidia-gpu --disable-openapi-validation |
333 |
| - ``` |
334 |
| -
|
335 |
| -1. Skip to [Validate GPU](#validate-gpu). |
336 |
| -
|
337 |
| -
|
338 |
| -### Manually |
339 |
| -
|
340 | 212 | 1. Create Nvidia namespace.
|
341 | 213 |
|
342 | 214 | ```yaml
|
@@ -659,7 +531,7 @@ This sections explains how to apply the Nvidia cluster config. Please read the [
|
659 | 531 |
|
660 | 532 | It may take some time for the Nvidia Operator and NFD to completely install and self-identify the machines. Run the following commands to validate that everything is running as expected:
|
661 | 533 |
|
662 |
| -1. Verify that NFD can see your GPU(s). |
| 534 | +1. Verify that NFD can see your GPU(s). |
663 | 535 |
|
664 | 536 | ```bash
|
665 | 537 | oc describe node | egrep 'Roles|pci-10de' | grep -v master
|
|
0 commit comments