Skip to content

Commit 402d125

Browse files
Added a blip about configuring MIG
1 parent 8ef9598 commit 402d125

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

INSTALLING_ONTO_EXISTING_CLUSTER_README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ This guide helps you install and use **OCI AI Blueprints** for the first time on
99
5. Deploy a sample recipe to that node.
1010
6. Test your deployment and undeploy
1111

12+
There is an additional section at the bottom for users who have the nvidia-gpu-operator installed, and would like to use Multi-Instance GPUs with H100 nodes visit [this section](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#multi-instance-gpu-setup).
13+
14+
Additionally, visit [this section](./INSTALLING_ONTO_EXISTING_CLUSTER_README.md#need-help) if you need to contact the team about setup issues.
15+
1216
---
1317

1418
## Overview
@@ -172,6 +176,33 @@ curl -L -H "Content-Type: application/json" -d '{"model": "/models/NousResearch/
172176
- go to Api Root -> deployment_logs
173177
- Look for: Directive decommission -> Ingress deleted -> Deployment deleted -> Service deleted -> Directive / decommission / completed.
174178

179+
## Multi-Instance GPU Setup
180+
If you have the nvidia gpu operator already installed, and would like to reconfigure it because you plan on using Multi-Instance GPUs (MIG) with your H100 nodes, you will need to manually update / reconfigure your cluster with helm.
181+
182+
This can be done like below:
183+
```bash
184+
# Get the deployment name
185+
helm list -n gpu-operator
186+
187+
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
188+
gpu-operator-1742982512 gpu-operator 1 2025-03-26 05:48:41.913183 -0400 EDT deployed gpu-operator-v24.9.2 v24.9.2
189+
190+
# Upgrade the deployment
191+
helm upgrade gpu-operator-1742982512 nvidia/gpu-operator \
192+
--namespace gpu-operator \
193+
--set mig.strategy="mixed" \
194+
--set migManager.enabled=true
195+
196+
Release "gpu-operator-1742982512" has been upgraded. Happy Helming!
197+
NAME: gpu-operator-1742982512
198+
LAST DEPLOYED: Wed Mar 26 05:59:23 2025
199+
NAMESPACE: gpu-operator
200+
STATUS: deployed
201+
REVISION: 2
202+
TEST SUITE: None
203+
```
204+
205+
175206
## Need Help?
176207
- Check out [Known Issues & Solutions](docs/known_issues/README.md) for troubleshooting common problems.
177208
- For questions or additional support, contact [[email protected]](mailto:[email protected]) or [[email protected]](mailto:[email protected]).

0 commit comments

Comments
 (0)