Merged
Conversation
9b75d52 to
8dec4c5
Compare
Contributor
There was a problem hiding this comment.
Pull Request Overview
This PR refactors the progressive rollout logic for TemporalWorkerDeployment to ensure that ramp values only advance safely after an adequate pause duration, even when a user manually intervenes, while also updating related tests and validation rules.
- Refactored progressive rollout logic and introduced the RampLastModifiedAt field to track ramp changes.
- Updated test helpers, planner logic, webhook validations, and CRD/schema definitions to support the new behavior.
Reviewed Changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| internal/testlogr/testlogr.go | Added a logger sink for writing logs to test output. |
| internal/testhelpers/make.go | Refactored test helper functions and moved them into a dedicated package. |
| internal/temporal/worker_deployment.go | Updated deployment state to include RampLastModifiedAt. |
| internal/planner/planner.go | Updated progressive rollout logic to use RampLastModifiedAt and handle step transitions. |
| internal/planner/planner_test.go | Added new test cases for progressive rollout edge cases and over-time transitions. |
| internal/k8s/deployments_test.go | Refactored test usage to align with the new test helper functions. |
| internal/controller/state_mapper.go | Mapped RampLastModifiedAt from the temporal state to deployment status. |
| helm/temporal-worker-controller/templates/crds/temporal.io_temporalworkerdeployments.yaml | Added CRD schema for RampLastModifiedAt. |
| api/v1alpha1/worker_types.go | Introduced RampLastModifiedAt in the deployment status. |
| api/v1alpha1/temporalworker_webhook_test.go | Added webhook tests for validating progressive rollout specs with ramp and pause constraints. |
| api/v1alpha1/temporalworker_webhook.go | Updated webhook validation logic to enforce rollout step requirements and ramp progression. |
Comments suppressed due to low confidence (2)
internal/planner/planner.go:368
- Consider clarifying via an inline comment that when the target ramp percentage does not match the current step's defined ramp, the ramp is reset immediately. This explicitly documents that the logic intentionally overrides the previous ramp regardless of pause duration.
if targetRampPercentage == nil || *targetRampPercentage != currentStep.RampPercentage {
internal/testhelpers/make.go:76
- [nitpick] Consider renaming the parameter 'twd' to a more generic identifier (such as 'obj') to better reflect that this helper function is generic and may operate on various types.
func ModifyObj[T any](twd T, callback func(obj T) T) T {
jlegrone
commented
Jun 30, 2025
robholland
approved these changes
Jul 1, 2025
Collaborator
robholland
left a comment
There was a problem hiding this comment.
Looks good, a few non-blocking comments about validation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What was changed
Why?
In addition to addressing the issue described in #25, this change makes the controller's behavior when taking over the rollout after a user manually edits the ramp value much safer.
For example, this is what would happen if a user manually edits the ramp value in the middle of a rollout, pauses for a while, and then allows the controller to resume the rollout:
sequenceDiagram participant Controller as Worker Controller participant Temporal as Temporal Server actor CLI as User Note over Controller, Temporal: Progressive Rollout: Steps [1%, 5%, 25%, 50%] Controller->>Temporal: Start rollout - Set ramp to 1% Controller->>Controller: Wait for pause duration (e.g. 1m) Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 1%, healthy Controller->>Temporal: Advance to next step - Set ramp to 5% Controller->>Controller: Wait for pause duration (e.g. 5m) Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 5%, healthy Controller->>Temporal: Advance to next step - Set ramp to 25% par Controller waits Controller->>Controller: Wait for pause duration (e.g. 10m) and User intervenes Note over CLI: User notices issues<br/>and wants to reduce ramp CLI->>Temporal: Manual intervention<br/>Set ramp to 10% via CLI end Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 10%<br/>(manually modified) par Controller waits Note over Controller: Switch to manual rollout mode until<br/>Deployment released by user. and User intervenes CLI->>Temporal: Release Deployment ramp<br/>(Controller allowed to take over ramping again) end Note over Controller: Next reconcile loop Controller->>Temporal: Check ramp status Temporal-->>Controller: Current ramp: 10%, healthy Controller->>Temporal: Resume rollout - Set ramp to 5%<br>(this is the highest ramp value <= current) Note over Controller: Continue normal progression Controller->>Temporal: Advance to 25% Controller->>Temporal: Advance to 50% Controller->>Temporal: Set as current (100%)Checklist
Closes Guarantee that the ramps specified in progressive rollout spend at least the requested time in each step #25
How was this tested: Unit tests
Any docs updates needed?
Not immediately, but we should probably document the requirement for increasing ramp values rather than relying solely on the validator.