@@ -40,28 +40,25 @@ pokprod002ctrl2 Ready control-plane,master 5d15h v1.29.11+148a389
4040Each worker node is equipped with eight [ NVIDIA
4141H100] ( https://www.nvidia.com/en-us/data-center/h100/ ) GPUs.
4242``` sh
43- kubectl describe node pokprod-b93r38s3
43+ oc debug node/ pokprod-b93r38s3 -- chroot /host lspci -d 10de:
4444```
4545```
46- Name: pokprod-b93r38s3
47- Roles: worker
48- Labels: beta.kubernetes.io/arch=amd64
49- ...
50- nvidia.com/GPU.product=NVIDIA-H100-80GB-HBM3
51- ...
52- nvidia.com/GPU.count=8
53- ...
54- Capacity:
55- cpu: 224
56- ephemeral-storage: 1873933640Ki
57- hugepages-1Gi: 0
58- hugepages-2Mi: 0
59- memory: 2113411308Ki
60- nvidia.com/GPU: 8
61- openshift.io/p0_storage_sriov_nodepolicy: 8
62- pods: 250
63- rdma/roce_gdr: 0
64- ...
46+ Starting pod/pokprod-b93r38s3-debug-4bv4j ...
47+ To use host binaries, run `chroot /host`
48+ 05:00.0 Bridge: NVIDIA Corporation GH100 [H100 NVSwitch] (rev a1)
49+ 06:00.0 Bridge: NVIDIA Corporation GH100 [H100 NVSwitch] (rev a1)
50+ 07:00.0 Bridge: NVIDIA Corporation GH100 [H100 NVSwitch] (rev a1)
51+ 08:00.0 Bridge: NVIDIA Corporation GH100 [H100 NVSwitch] (rev a1)
52+ 18:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
53+ 2a:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
54+ 3a:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
55+ 5d:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
56+ 9a:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
57+ ab:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
58+ ba:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
59+ db:00.0 3D controller: NVIDIA Corporation GH100 [H100 SXM5 80GB] (rev a1)
60+
61+ Removing debug pod ...
6562```
6663For this tutorial, we assume the [ NVIDIA GPU
6764operator] ( https://docs.nvidia.com/datacenter/cloud-native/GPU-operator/latest/index.html )
0 commit comments