-
Notifications
You must be signed in to change notification settings - Fork 53
Add MPS control daemon support to k8s device plugin #789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Add support for NVIDIA Multi-Process Service (MPS) control daemon, including service configuration and device plugin updates. Signed-off-by: Matthew Yeazel <[email protected]>
| ExecStart=/usr/bin/mps-control-daemon --config-file /etc/nvidia-k8s-device-plugin/settings.yaml | ||
| {{else}} | ||
| ExecStart= | ||
| ExecStart=/usr/bin/sleep infinity |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed. Options to remove the sleep are limited since a combination of our tooling and SystemD behaviour is preventing us from:
- leaving the service in an
exitedstate since SystemD will nottry-restartit - letting the service fail till mps is available since the system will be in a degraded state
- having the service without an
[Install]section since we don't have logic to conditionallystartandstopa service - having the entire service file be rendered on enabling mps since that would require a
daemon reloadwhich is not possible by the existing logic.
Without us writing new logic to conditionally enable and disable services, this would be the way forward to enable mps support
|
|
||
| [Service] | ||
| Type=simple | ||
| ExecStart=/bin/true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer /usr/bin/false here so that the start fails in a more obvious way.
| [Service] | ||
| {{#if (eq settings.kubelet-device-plugins.nvidia.device-sharing-strategy "mps")}} | ||
| ExecStart= | ||
| ExecStart=/usr/bin/mps-control-daemon --config-file /etc/nvidia-k8s-device-plugin/settings.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the code, there's this log message:
klog.Info("No devices are configured for MPS sharing; Waiting indefinitely.")
Which seems like the infinite sleep we want. Can we render the config in a way that triggers this?
Issue number:
Related to: bottlerocket-os/bottlerocket#4673
Description of changes:
This builds the mps-control-daemon binary from the device plugin that allows MPS support. We have to patch the hardcoded paths for Bottlerocket usage since the device plugin assumes it can write to / which doesn't work with Bottlerocket.
This change also adds a new service to start this binary when settings request it. Otherwise it daemonizes
sleep infinityto let systemdtry-restartupon changing the settings for MPS.The change should be safe to take without the bottlerocket-os/bottlerocket-kernel-kit#347 change or the upcoming settings change but the daemon will not work without the kmod update and the settings being properly set.
Testing done:
Build images with the kernel change, settings changes, and validated that a node will come up with MPS working if set in user data, and the services are restarted and MPS can be enabled at runtime as well.
Setting in userdata for a g6.2xlarge which only has one GPU
Details
eksctlconfig snippet for setting it at the beginning:Results in a node reporting nvidia.com/gpu.shared:
Setting the MPS after boot
Details
Start with a node with no configuration for MPS:
The node shows one GPU:
Then set MPS:
Now check the rest of the system:
And the node shows the empty nvidia.com/gpu offering but now a shared one:
This is a known edge case and is similar to how timeslicing works. In order to avoid old resources, you'd need to start with the user-data approach.
Shifting to
rename-by-default=false(apiclient set settings.kubelet-device-plugins.nvidia.mps.rename-by-default=false) will have the original nvidia.com/gpu resource instead:And finally, setting sharing to
nonedisables MPS:And the resource goes back down to 1.
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.