Skip to content

Conversation

@yeazelm
Copy link
Contributor

@yeazelm yeazelm commented Oct 27, 2025

Description of changes:
Add the MIG profiles for RTX PRO 6000 devices. This change requires bottlerocket-os/bottlerocket-settings-sdk#105 to allow the profile to be set.

Got the ID from `nvidia-smi`:
bash-5.1# nvidia-smi --query-gpu=pci.device_id,mig.mode.current,mig.mode.pending --format=csv,noheader
0x2BB510DE, Disabled, Disabled
0x2BB510DE, Disabled, Disabled
0x2BB510DE, Disabled, Disabled
0x2BB510DE, Disabled, Disabled
0x2BB510DE, Disabled, Disabled
0x2BB510DE, Disabled, Disabled
0x2BB510DE, Disabled, Disabled
0x2BB510DE, Disabled, Disabled

The profiles as well:

bash-5.1# nvidia-smi mig -lgip
+-------------------------------------------------------------------------------+
| GPU instance profiles:                                                        |
| GPU   Name               ID    Instances   Memory     P2P    SM    DEC   ENC  |
|                                Free/Total   GiB              CE    JPEG  OFA  |
|===============================================================================|
|   0  MIG 1g.24gb         14     4/4        23.62      No     46     1     1   |
|                                                               1     1     0   |
+-------------------------------------------------------------------------------+
|   0  MIG 1g.24gb+me      21     1/1        23.62      No     46     1     1   |
|                                                               1     1     1   |
+-------------------------------------------------------------------------------+
|   0  MIG 1g.24gb+gfx     47     4/4        23.62      No     46     1     1   |
|                                                               1     1     0   |
+-------------------------------------------------------------------------------+
|   0  MIG 1g.24gb+me.all  65     1/1        23.62      No     46     4     4   |
|                                                               1     4     1   |
+-------------------------------------------------------------------------------+
|   0  MIG 1g.24gb-me      67     4/4        23.62      No     46     0     0   |
|                                                               1     0     0   |
+-------------------------------------------------------------------------------+
|   0  MIG 2g.48gb          5     2/2        47.38      No     94     2     2   |
|                                                               2     2     0   |
+-------------------------------------------------------------------------------+
|   0  MIG 2g.48gb+gfx     35     2/2        47.38      No     94     2     2   |
|                                                               2     2     0   |
+-------------------------------------------------------------------------------+
|   0  MIG 2g.48gb+me.all  64     1/1        47.38      No     94     4     4   |
|                                                               2     4     1   |
+-------------------------------------------------------------------------------+
|   0  MIG 2g.48gb-me      66     2/2        47.38      No     94     0     0   |
|                                                               2     0     0   |
+-------------------------------------------------------------------------------+
|   0  MIG 4g.96gb          0     1/1        95.00      No     188    4     4   |
|                                                               4     4     1   |
+-------------------------------------------------------------------------------+
|   0  MIG 4g.96gb+gfx     32     1/1        95.00      No     188    4     4   |
|                                                               4     4     1   |
+-------------------------------------------------------------------------------+

Testing done:
Built aws-k8s-1.34-nvidia and tested setting the profile:

apiclient set settings.kubelet-device-plugins.nvidia.device-partitioning-strategy="mig"

apiclient apply <<EOF
[settings.kubelet-device-plugins.nvidia.mig.profile]
"rtxpro6000.96gb"="4"
EOF

The mig manager found the device:

bash-5.1# journalctl -u nvidia-migmanager
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal systemd[1]: Starting NVIDIA MIG manager service...
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] nvidia-migmanager started
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Fetching GPU devices data ...
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal nvidia-migmanager[29688]: 16:45:10 [INFO] Found NVIDIA RTX PRO 6000 Blackwell GPU.
Oct 24 16:45:10 ip-192-168-73-199.us-east-2.compute.internal systemd[1]: Finished NVIDIA MIG manager service.

Validated that the device is offering 32 vs 8:

Capacity:
...
  nvidia.com/gpu:     32
  pods:               110
Allocatable:
...
  nvidia.com/gpu:     32
  pods:               110

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@yeazelm yeazelm requested a review from piyush-jena October 27, 2025 14:46
@yeazelm yeazelm merged commit 18b5e94 into bottlerocket-os:develop Oct 29, 2025
2 checks passed
@yeazelm yeazelm deleted the mig_profiles branch October 29, 2025 22:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants