-
Notifications
You must be signed in to change notification settings - Fork 17
Wait for at least last step duration #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9b75d52 to
8dec4c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the progressive rollout logic for TemporalWorkerDeployment to ensure that ramp values only advance safely after an adequate pause duration, even when a user manually intervenes, while also updating related tests and validation rules.
- Refactored progressive rollout logic and introduced the RampLastModifiedAt field to track ramp changes.
- Updated test helpers, planner logic, webhook validations, and CRD/schema definitions to support the new behavior.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| internal/testlogr/testlogr.go | Added a logger sink for writing logs to test output. |
| internal/testhelpers/make.go | Refactored test helper functions and moved them into a dedicated package. |
| internal/temporal/worker_deployment.go | Updated deployment state to include RampLastModifiedAt. |
| internal/planner/planner.go | Updated progressive rollout logic to use RampLastModifiedAt and handle step transitions. |
| internal/planner/planner_test.go | Added new test cases for progressive rollout edge cases and over-time transitions. |
| internal/k8s/deployments_test.go | Refactored test usage to align with the new test helper functions. |
| internal/controller/state_mapper.go | Mapped RampLastModifiedAt from the temporal state to deployment status. |
| helm/temporal-worker-controller/templates/crds/temporal.io_temporalworkerdeployments.yaml | Added CRD schema for RampLastModifiedAt. |
| api/v1alpha1/worker_types.go | Introduced RampLastModifiedAt in the deployment status. |
| api/v1alpha1/temporalworker_webhook_test.go | Added webhook tests for validating progressive rollout specs with ramp and pause constraints. |
| api/v1alpha1/temporalworker_webhook.go | Updated webhook validation logic to enforce rollout step requirements and ramp progression. |
Comments suppressed due to low confidence (2)
internal/planner/planner.go:368
- Consider clarifying via an inline comment that when the target ramp percentage does not match the current step's defined ramp, the ramp is reset immediately. This explicitly documents that the logic intentionally overrides the previous ramp regardless of pause duration.
if targetRampPercentage == nil || *targetRampPercentage != currentStep.RampPercentage {
internal/testhelpers/make.go:76
- [nitpick] Consider renaming the parameter 'twd' to a more generic identifier (such as 'obj') to better reflect that this helper function is generic and may operate on various types.
func ModifyObj[T any](twd T, callback func(obj T) T) T {
carlydf
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice!
robholland
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, a few non-blocking comments about validation.
What was changed
Why?
In addition to addressing the issue described in #25, this change makes the controller's behavior when taking over the rollout after a user manually edits the ramp value much safer.
For example, this is what would happen if a user manually edits the ramp value in the middle of a rollout, pauses for a while, and then allows the controller to resume the rollout:
sequenceDiagram participant Controller as Worker Controller participant Temporal as Temporal Server actor CLI as User Note over Controller, Temporal: Progressive Rollout: Steps [1%, 5%, 25%, 50%] Controller->>Temporal: Start rollout - Set ramp to 1% Controller->>Controller: Wait for pause duration (e.g. 1m) Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 1%, healthy Controller->>Temporal: Advance to next step - Set ramp to 5% Controller->>Controller: Wait for pause duration (e.g. 5m) Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 5%, healthy Controller->>Temporal: Advance to next step - Set ramp to 25% par Controller waits Controller->>Controller: Wait for pause duration (e.g. 10m) and User intervenes Note over CLI: User notices issues<br/>and wants to reduce ramp CLI->>Temporal: Manual intervention<br/>Set ramp to 10% via CLI end Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 10%<br/>(manually modified) par Controller waits Note over Controller: Switch to manual rollout mode until<br/>Deployment released by user. and User intervenes CLI->>Temporal: Release Deployment ramp<br/>(Controller allowed to take over ramping again) end Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 10%, healthy Controller->>Temporal: Resume rollout - Set ramp to 5%<br>(this is the highest ramp value <= current) Note over Controller: Continue normal progression Controller->>Temporal: Advance to 25% Controller->>Temporal: Advance to 50% Controller->>Temporal: Set as current (100%)Checklist
Closes Guarantee that the ramps specified in progressive rollout spend at least the requested time in each step #25
How was this tested: Unit tests
Any docs updates needed?
Not immediately, but we should probably document the requirement for increasing ramp values rather than relying solely on the validator.