-
Notifications
You must be signed in to change notification settings - Fork 51
model-serving: enable rolling updates for networkTopology and gangPol… #704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
model-serving: enable rolling updates for networkTopology and gangPol… #704
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Welcome @aabhinavvvvvvv! It looks like this is your first PR to volcano-sh/kthena 🎉 |
Summary of ChangesHello @aabhinavvvvvvv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the ModelServing controller's ability to manage declarative updates. Previously, modifications to Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great improvement, enabling rolling updates for networkTopology and gangPolicy changes. The core logic change to hash the entire spec.template is correct and well-implemented. I appreciate the attention to backward compatibility in controller_revision.go, which ensures existing deployments are not affected. The addition of comprehensive unit tests in revision_test.go and a practical example YAML file significantly improves the quality and usability of this feature. The documentation updates are also clear and helpful.
I have a couple of suggestions to improve the code further. One is to fix a broken link in the documentation, and the other is to refactor a small piece of duplicated code for better maintainability.
| 3. Perform a rolling update respecting the `rolloutStrategy` configuration | ||
| 4. Reschedule pods according to the new topology constraints | ||
|
|
||
| For a complete example, see [network-topology-rolling-update.yaml](../assets/examples/model-serving/network-topology-rolling-update.yaml). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relative link to the example YAML file appears to be incorrect. Given the file structure, the path should be adjusted to correctly point to network-topology-rolling-update.yaml.
| For a complete example, see [network-topology-rolling-update.yaml](../assets/examples/model-serving/network-topology-rolling-update.yaml). | |
| For a complete example, see [network-topology-rolling-update.yaml](../../../../examples/model-serving/network-topology-rolling-update.yaml). |
…icy changes - Modified revision calculation to hash entire Spec.Template instead of just Roles - Added comprehensive unit tests for revision change detection - Created example YAML demonstrating networkTopology rolling updates - Updated documentation explaining rolling update triggers Signed-off-by: aabhinavvvvvvv <[email protected]>
12c95e5 to
8dbada1
Compare
@aabhinavvvvvvv This need a deep discussion. Whether allow updating them, and what's the behavior if allow |
|
Ok. I'll wait for updates and further guidance |
|
@aabhinavvvvvvv Thanks for the understanding, we can further discuss in the issue linked |
|
/hold for now |
Enable rolling updates for networkTopology and gangPolicy changes
Fixes #690
What type of PR is this?
/kind feature
What this PR does / why we need it:
This PR enables automatic rolling updates when
networkTopologyorgangPolicyfields are modified in a ModelServing resource, aligning with Kubernetes' declarative API philosophy.Current Problem:
spec.template.networkTopologyupdates the PodGroup but does not reschedule existing podsSolution:
Spec.Template.RolestoSpec.Template(entire template)GetTemplateFromControllerRevision()functionKey Features:
Which issue(s) this PR fixes:
Fixes #690
Special notes for your reviewer:
Backward Compatibility: The new
GetTemplateFromControllerRevision()function handles both old (roles-only) and new (full template) ControllerRevision formats. Existing deployments will continue to work without migration.GangPolicy Inclusion: While GangPolicy is mostly immutable after being set, including it in the revision hash is correct because:
No Breaking Changes:
GetRolesFromControllerRevision()is deprecated but still functional (calls new function internally)Test Coverage:
Files to Review:
controller_revision.go: New template extraction function (backward compatible)revision_test.go: Comprehensive test coveragemodel-serving-rolling-update.md: User-facing documentationnetwork-topology-rolling-update.yaml: Practical example with detailed commentsDoes this PR introduce a user-facing change?: