Skip to content

Commit 652018a

Browse files
committed
SME review edits
1 parent 8db8484 commit 652018a

File tree

1 file changed

+2
-130
lines changed

1 file changed

+2
-130
lines changed

articles/openshift/howto-gpu-workloads.md

Lines changed: 2 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -36,22 +36,6 @@ MacOS
3636
brew install jq moreutils gettext
3737
```
3838

39-
### Helm Prerequisites
40-
41-
This acrticle includes instructions for using Help to deploy the GPU operator. If you plan to use Helm, you will need do the following:
42-
43-
1. Add the MOBB chart repository to your Helm using the following command:
44-
45-
```bash
46-
helm repo add mobb https://rh-mobb.github.io/helm-charts/
47-
```
48-
49-
1. Update your repositories using the following command:
50-
51-
```bash
52-
helm repo update
53-
```
54-
5539
## Request GPU quota
5640

5741
All GPU quotas in Azure are 0 by default. You will need to sign in to the Azure portal and request GPU quota. Since there is a lot of competition for GPU workers, you may have to provision an ARO cluster in a region where you can actually reserve GPU.
@@ -91,46 +75,6 @@ Update your pull secret to make sure you can install operators and connect to [c
9175
> [!NOTE]
9276
> Skip this step if you have already recreated a full pull secret with cloud.redhat.com enabled.
9377
94-
### Using Helm
95-
96-
1. Before deploying the Helm chart, you must adopt the existing pull secret:
97-
98-
```bash
99-
kubectl -n openshift-config annotate secret \
100-
pull-secret meta.helm.sh/release-name=pull-secret
101-
kubectl -n openshift-config annotate secret \
102-
pull-secret meta.helm.sh/release-namespace=openshift-config
103-
kubectl -n openshift-config label secret \
104-
pull-secret app.kubernetes.io/managed-by=Helm
105-
```
106-
107-
1. Download your new pull secret from **https://console.redhat.com/openshift/downloads -> Tokens -> Pull secret**.
108-
109-
1. Use the new pull secret to update/create the pull secret in your cluster. This chart will merge the in-cluster pull secret with the new pull secret:
110-
111-
```
112-
helm upgrade --install pull-secret mobb/aro-pull-secret \
113-
-n openshift-config --set-file pullSecret=$HOME/Downloads/pull-secret.txt
114-
```
115-
116-
1. Enable Operator Hub:
117-
118-
```bash
119-
oc patch configs.samples.operator.openshift.io cluster --type=merge \
120-
-p='{"spec":{"managementState":"Managed"}}'
121-
oc patch operatorhub cluster --type=merge \
122-
-p='{"spec":{"sources":[
123-
{"name":"redhat-operators","disabled":false},
124-
{"name":"certified-operators","disabled":false},
125-
{"name":"community-operators","disabled":false},
126-
{"name":"redhat-marketplace","disabled":false}
127-
]}}'
128-
```
129-
130-
1. Skip to [GPU Machine Set](#gpu-machine-set).
131-
132-
### Manually
133-
13478
1. Log into to [cloud.redhat.com](https://cloud.redhat.com/).
13579

13680
1. Browse to https://cloud.redhat.com/openshift/install/azure/aro-provisioned.
@@ -168,28 +112,7 @@ Update your pull secret to make sure you can install operators and connect to [c
168112

169113
## GPU Machine Set
170114

171-
ARO uses Kubernetes MachineSet to create machine sets. The procedures below explain how to export the first machine set in a cluster and use that as a template to build a single GPU machine.
172-
173-
<!--I'm going to export the first machine set in my cluster (az 1) and use that as a template to build a single GPU machine in southcentralus region 1.-->
174-
175-
### Export using Helm
176-
177-
1. Create a new machine-set (replicas of 1). See the Chart's [values](https://github.com/rh-mobb/helm-charts/blob/main/charts/aro-gpu/values.yaml) file for configuration options.
178-
179-
```
180-
helm upgrade --install -n openshift-machine-api \
181-
gpu mobb/aro-gpu
182-
```
183-
184-
1. Wait for the new GPU nodes to become available.
185-
186-
```bash
187-
watch oc get machines
188-
```
189-
190-
1. Skip to [Install Nvidia GPU Operator](#install-nvidia-gpu-operator)
191-
192-
### Export manually
115+
ARO uses Kubernetes MachineSet to create machine sets. The procedure below explains how to export the first machine set in a cluster and use that as a template to build a single GPU machine.
193116

194117
1. View existing machine sets.
195118

@@ -286,57 +209,6 @@ Use the following steps to create the new GPU machine. It may take 10-15 minutes
286209

287210
This section explains how to create the `nvidia-gpu-operator` namespace, set up the operator group, and install the Nvidia GPU operator.
288211

289-
### Helm
290-
291-
1. Create namespaces.
292-
293-
```bash
294-
oc create namespace openshift-nfd
295-
oc create namespace nvidia-gpu-operator
296-
```
297-
298-
1. Use the `mobb/operatorhub` chart to deploy the needed operators.
299-
300-
```bash
301-
helm upgrade -n nvidia-gpu-operator nvidia-gpu-operator \
302-
mobb/operatorhub --install \
303-
--values https://raw.githubusercontent.com/rh-mobb/helm-charts/main/charts/nvidia-gpu/files/operatorhub.yaml
304-
```
305-
306-
1. Wait until the two operators are running.
307-
308-
```bash
309-
watch kubectl get pods -n openshift-nfd
310-
```
311-
312-
```
313-
NAME READY STATUS RESTARTS AGE
314-
nfd-controller-manager-7b66c67bd9-rk98w 2/2 Running 0 47s
315-
```
316-
317-
```bash
318-
watch oc get pods -n nvidia-gpu-operator
319-
```
320-
321-
```
322-
NAME READY STATUS RESTARTS AGE
323-
gpu-operator-5d8cb7dd5f-c4ljk 1/1 Running 0 87s
324-
```
325-
326-
1. Install the Nvidia GPU Operator chart.
327-
328-
```bash
329-
330-
```bash
331-
helm upgrade --install -n nvidia-gpu-operator nvidia-gpu \
332-
mobb/nvidia-gpu --disable-openapi-validation
333-
```
334-
335-
1. Skip to [Validate GPU](#validate-gpu).
336-
337-
338-
### Manually
339-
340212
1. Create Nvidia namespace.
341213

342214
```yaml
@@ -659,7 +531,7 @@ This sections explains how to apply the Nvidia cluster config. Please read the [
659531
660532
It may take some time for the Nvidia Operator and NFD to completely install and self-identify the machines. Run the following commands to validate that everything is running as expected:
661533
662-
1. Verify that NFD can see your GPU(s).
534+
1. Verify that NFD can see your GPU(s).
663535
664536
```bash
665537
oc describe node | egrep 'Roles|pci-10de' | grep -v master

0 commit comments

Comments
 (0)