Skip to content

Commit 57cb8fb

Browse files
Merge pull request #11 from OguzPastirmaci/main
Add FAQ section
2 parents 119be77 + cf94acc commit 57cb8fb

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

README.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ kubectl apply -f https://raw.githubusercontent.com/oracle-quickstart/oci-hpc-oke
146146
kubectl apply -f https://raw.githubusercontent.com/oracle-quickstart/oci-hpc-oke/main/manifests/ip-pool.yaml
147147
```
148148

149-
### Create the topology config map
149+
### Create the topology ConfigMap
150150
This step creates a ConfigMap that can be used as the NCCL topology file when running your jobs that use NCCL as the backend.
151151

152152
You can find the topology files in the [topology directory](../manifests/topology/) in this repo. Please make sure you use the correct topology file based on your shape when creating the ConfigMap.
@@ -273,3 +273,19 @@ Warning: Permanently added 'nccl-allreduce-job0-mpiworker-1.nccl-allreduce-job0'
273273
# Avg bus bandwidth : 66.4834
274274
#
275275
```
276+
277+
## FAQ
278+
#### Are there any features that are not supported when using self-managed nodes?
279+
Yes, some features and capabilities are not available, or not yet available, when using self-managed nodes. Please see [this link](https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengworkingwithselfmanagednodes.htm) for a list of features and capabilities that are not available for self-managed nodes.
280+
281+
#### Can I use Ubuntu as the operating system?
282+
We are working on adding support for Ubuntu, but it is not available today.
283+
284+
#### I don't see my GPU nodes in the OKE page in the console under worker pools
285+
This is expected. Currently, only the worker pools with the `node-pool` mode are listed. Self-managed nodes (`cluster-network` and `instance-pool` modes in worker pools) are created by you and joined to the OKe cluster, rather than OKE has created for you.
286+
287+
#### Can I use Multi-Instance GPU (MIG)?
288+
Yes, you can configure GPU Operator with MIG. Please see the instructions [here](https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/gpu-operator-mig.html).
289+
290+
#### If I don't need RDMA connectivity between my H100 or A100 nodes, do I still need to follow the instructions in this repo?
291+
No, if you don't need RDMA connectivity between your nodes, you can deploy an OKE cluster without using any self-managed nodes. The easiest way to do it is using the web console.

0 commit comments

Comments
 (0)