-
Notifications
You must be signed in to change notification settings - Fork 566
Add in changes to support MPS settings #4744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| /// GPU sharing in the device plugin. | ||
| fn run() -> Result<()> { | ||
| migrate(AddPrefixesMigration(vec![ | ||
| "settings.kubelet-device-plugins.nvidia.mps", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might have to add "configuration-files.nvidia-mps-control-daemon-exec-start-conf",.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if you have to add a migration for "settings.kubelet-device-plugins.nvidia.device-sharing-strategy" as well because the allowed value can also be mps now.
fn run() -> Result<()> {
migrate(RestrictListsMigration(vec![ListRestriction {
setting: "settings.kubelet-device-plugins.nvidia.device-sharing-strategy",
allowed_vals: &["time-slicing"],
}]))
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a custom function for this since we don't have a helper for Enum variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the setting - KubernetesCPUManagerPolicyOption (https://github.com/bottlerocket-os/bottlerocket-settings-sdk/pull/104/changes) in settings-sdk and this migration - https://github.com/bottlerocket-os/bottlerocket/blob/develop/sources/settings-migrations/v1.51.0/kubernetes-beta-cpu-manager-policy-options/src/main.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like the custom function would still allow "mps" upon downgrade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw this after re-writing my code, this is a nice find! My implementation is a bit more targeted and specific to this one use case but that example would have worked as well. would have worked as well.
sources/settings-migrations/v1.54.0/kubelet-device-plugins-mps-settings/src/main.rs
Outdated
Show resolved
Hide resolved
Add the additional services and configuration files for the nvidia-mps-control-daemon Signed-off-by: Matthew Yeazel <yeazelm@amazon.com>
|
^ Updated with core and kernel kit releases, Bottlerocket SDK update, and comments for migrations. |
Signed-off-by: Matthew Yeazel <yeazelm@amazon.com>
|
^ Updated the code to break into two migrations. The first migration's datastore would be lost when the second one ran. This keeps the two separate and ensures both complete. |
|
Just a note, if we wanted to do 2 sets of migrations in 1, this works as well: But I think I like the 2 separate migrations since they are different trees anyways. |
Issue number:
Closes # #4673
Description of changes:
Update the shared defaults configuration files and services to accomodate the
nvidia-mps-control-daemon. Add a migration for the new settings.Testing done:
cargo makecompletes. For full testing of the feature, look at bottlerocket-os/bottlerocket-core-kit#789Migration testing passed.
From the console when downgrading:
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.