feat: Implement SandboxWarmPool recreate on template updates#347
feat: Implement SandboxWarmPool recreate on template updates#347shrutiyam-glitch wants to merge 13 commits intokubernetes-sigs:mainfrom
Conversation
✅ Deploy Preview for agent-sandbox canceled.
|
|
Hi @shrutiyam-glitch. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
dhenkel92
left a comment
There was a problem hiding this comment.
Thank you for working on this feature. It's painful to roll out changes to a warm pool right now 🙂
|
/retest |
igooch
left a comment
There was a problem hiding this comment.
Overall good work automating the warmpool pod recreation via the new spec hash label.
A few minor suggestions for performance and edge cases in the inline comments.
|
This feature would be really helpful for our use case! Looking forward to the merge. 👍 |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: shrutiyam-glitch The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
igooch
left a comment
There was a problem hiding this comment.
Good work! A few comments, but the bulk of the logic looks good.
It would be good to have an e2e test for this, although that does not need to be a part of this PR.
extensions/controllers/utils.go
Outdated
There was a problem hiding this comment.
Since the contents of this file are only used within the Warmpool controller I'd recommend moving them back into that file.
extensions/controllers/utils.go
Outdated
|
|
||
| // SandboxWarmPool update strategies | ||
| RecreateStrategy = "Recreate" | ||
| OnDeleteStrategy = "OnDelete" |
There was a problem hiding this comment.
Recommend putting these in the api/v1alpha1 type definition, with similar naming pattern as StatefulSet, so that API consumers can use the values.
| } | ||
|
|
||
| if isOrphan { | ||
| // Pod has no controller - adopt it |
There was a problem hiding this comment.
This only happens if the strategy is OnDelete, correct? Could you update the comment to reflect this?
There was a problem hiding this comment.
So, if the strategy is Recreate and if the orphaned pod is not stale, the controller will go ahead and adopt the pod.
| } | ||
| } | ||
|
|
||
| func TestReconcilePool_TemplateUpdateRollout(t *testing.T) { |
There was a problem hiding this comment.
Could you update these to use the testCases := []struct{} ... for _, tc := range testCases {} pattern here?
| } | ||
|
|
||
| // reconcilePool ensures the correct number of pods exist in the pool | ||
| func (r *SandboxWarmPoolReconciler) reconcilePool(ctx context.Context, warmPool *extensionsv1alpha1.SandboxWarmPool) error { |
There was a problem hiding this comment.
This function is getting really long and hard to review/maintain. Break it up into smaller helper functions. See go/small-functions
| template, tmplErr := r.getTemplate(ctx, warmPool) | ||
| var currentTemplateHash string | ||
| if tmplErr == nil { | ||
| currentTemplateHash = computeTemplateHash(template) |
There was a problem hiding this comment.
It's not safe to directly compute template hash for comparison, because pod schema changes in upstream will affect the value even if the template hasn't changed. This will end up deleting all warmed resources after a cluster upgrade. The value should only be computed once and added as a label, and later get the hash value from labels to compare. The same pattern is used in Deployment with pod-template-hash label.
There was a problem hiding this comment.
Per discussion, if the Warmpool can create sandboxes directly (#390), we can use sandbox spec to compare directly.
|
This is a relatively small nit but I'd suggest we reconsider the name |
|
/hold |
|
/unhold |
Fixes #323
This PR implements the rollout logic for
SandboxWarmPoolwhen its associatedSandboxTemplateis updated.This change adds two update strategies—Recreate and OnReplenish—allowing users to control how stale pods are handled.
Changes:
1. Update Strategy Definition
Added a new
UpdateStrategyfield to theSandboxWarmPoolspecification.Recreate(Default): Ensures the pool contains only fresh sandboxes by immediately deleting stale sandboxes when the template is updated.OnReplenish: Retains existing sandboxes even if they are stale; they are only replaced after being manually deleted or claimed from the pool. This is applicable for any changes in the associatedSandboxTemplateas well as the change of name ofsandboxTemplateRefinSandboxWarmpool.2. Template Versioning and Tracking
spec.podTemplateof theSandboxTemplatecontent to detect changes accurately.spec.podTemplate.specof the current template andspec.podTemplate.specof the current sandbox are compared. This comparison boolean value is stored with the hash as the key, to avoid this comparison for every sandbox in the warmpool.3. Controller Logic Updates
SandboxWarmPoolcontroller now watchesSandboxTemplateresources. It triggers a reconciliation for anySandboxWarmPoolthat references a modified template. It uses anEnqueueRequestsFromMapFuncto identify and reconcile all warmpools referencing a modified template.Recreate, the controller deletes identified stale pods, allowing the standard reconciliation loop to replace them with pods based on the new template.Testing Performed:
SandboxTemplateimage triggers the deletion of idle sandboxes in the associatedSandboxWarmPool.SandboxClaimare NOT deleted during a template update.templateRef(with new spec) on theSandboxWarmPoolitself triggers a full pool rotation.templateRef(with old spec) on theSandboxWarmPoolitself triggers a full pool rotation.Unit tests added:
Steps followed:
SandboxTemplate,SandboxWarmpoolwith defaultRecreatestrategy andSandboxClaimSandbox
python-sdk-warmpool-wltr6is adopted by thesandbox-claim.python-counter-templateand applied.The unclaimed warmpool sandboxes are recreated with the new updated template spec.
spec.sandboxTemplateRefin theSandboxWarmPoolmanifest to use the sandbox templatepct. All the unclaimed sandboxes are by default recreated.