|
| 1 | +# Start/Stop UpdateRun API Implementation |
| 2 | + |
| 3 | +## Date: 2025-10-29-1500 |
| 4 | + |
| 5 | +## Requirements |
| 6 | + |
| 7 | +**Primary Objective**: Implement the ability to start and stop an UpdateRun. Currently when we create an updateRun it immediately initializes and executes. Instead when an updateRun is created we initialize but won't execute until user starts the updateRun. And we want the ability to stop the updateRun and if resources are being propagated to a particular cluster when stopped it will complete propagation for the cluster and stop propagation for all other cluster in that stage. And when updateRun is started again it will continue where it left off. |
| 8 | + |
| 9 | +## Understanding Current Implementation |
| 10 | + |
| 11 | +From the codebase analysis: |
| 12 | + |
| 13 | +1. **Current Flow**: CreateUpdateRun → Initialize → Execute immediately |
| 14 | +2. **Controller Logic**: Located in `/pkg/controllers/updaterun/controller.go` |
| 15 | +3. **Main Reconcile Loop**: Calls `initialize()` then immediately `execute()` |
| 16 | +4. **Conditions**: Uses "Initialized", "Progressing", "Succeeded" conditions |
| 17 | +5. **Stage Execution**: Tracks per-stage and per-cluster status in `StageUpdatingStatus` |
| 18 | + |
| 19 | +## Implementation Plan |
| 20 | + |
| 21 | +### Phase 1: API Design - Add Start/Stop Controls |
| 22 | +- [x] Task 1.1: Add `Started` field to UpdateRunSpec to control execution start |
| 23 | +- [x] Task 1.2: Add new condition type `StagedUpdateRunConditionStarted` |
| 24 | +- [x] Task 1.3: Update UpdateRunSpec validation to handle new field |
| 25 | +- [x] Task 1.4: Update kubebuilder comments and validation tags |
| 26 | + |
| 27 | +### Phase 2: Controller Logic Updates |
| 28 | +- [ ] Task 2.1: Modify controller.go reconcile loop to check Started field |
| 29 | +- [ ] Task 2.2: Separate initialization from execution in controller flow |
| 30 | +- [ ] Task 2.3: Add logic to mark UpdateRun as ready but not started after initialization |
| 31 | +- [ ] Task 2.4: Handle start transition - when Started changes from false/nil to true |
| 32 | +- [ ] Task 2.5: Implement stop transition - when Started changes from true to false |
| 33 | + |
| 34 | +### Phase 3: Graceful Stop Implementation |
| 35 | +- [ ] Task 3.1: Track in-progress cluster updates during stop |
| 36 | +- [ ] Task 3.2: Complete current cluster propagation before stopping |
| 37 | +- [ ] Task 3.3: Mark stopped stage/clusters appropriately in status |
| 38 | +- [ ] Task 3.4: Ensure resume from correct point when restarted |
| 39 | + |
| 40 | +### Phase 4: Status and Condition Management |
| 41 | +- [ ] Task 4.1: Add Started condition management functions |
| 42 | +- [ ] Task 4.2: Update condition progression: Initialize → Started → Progressing → Succeeded |
| 43 | +- [ ] Task 4.3: Handle stop scenarios in condition updates |
| 44 | +- [ ] Task 4.4: Update metrics to track start/stop events |
| 45 | + |
| 46 | +### Phase 5: Testing |
| 47 | +- [ ] Task 5.1: Write unit tests for new spec field and conditions |
| 48 | +- [ ] Task 5.2: Write integration tests for start/stop workflow |
| 49 | +- [ ] Task 5.3: Write e2e tests for graceful stop and resume scenarios |
| 50 | +- [ ] Task 5.4: Test edge cases (stop during cluster propagation, restart scenarios) |
| 51 | + |
| 52 | +### Phase 6: Documentation and Examples |
| 53 | +- [ ] Task 6.1: Update API documentation |
| 54 | +- [ ] Task 6.2: Add example YAML files showing start/stop usage |
| 55 | +- [ ] Task 6.3: Update user guide with start/stop procedures |
| 56 | + |
| 57 | +## API Design |
| 58 | + |
| 59 | +### UpdateRunSpec Changes |
| 60 | +```go |
| 61 | +type UpdateRunSpec struct { |
| 62 | + // ... existing fields ... |
| 63 | + |
| 64 | + // Started indicates whether the update run should be started. |
| 65 | + // When false or nil, the update run will initialize but not execute. |
| 66 | + // When true, the update run will begin execution. |
| 67 | + // Changing from true to false will gracefully stop the update run. |
| 68 | + // +kubebuilder:validation:Optional |
| 69 | + Started *bool `json:"started,omitempty"` |
| 70 | +} |
| 71 | +``` |
| 72 | + |
| 73 | +### New Condition Type |
| 74 | +```go |
| 75 | +const ( |
| 76 | + // ... existing conditions ... |
| 77 | + |
| 78 | + // StagedUpdateRunConditionStarted indicates whether the staged update run has been started. |
| 79 | + // Its condition status can be one of the following: |
| 80 | + // - "True": The staged update run has been started and is ready to progress. |
| 81 | + // - "False": The staged update run is stopped or not yet started. |
| 82 | + StagedUpdateRunConditionStarted StagedUpdateRunConditionType = "Started" |
| 83 | +) |
| 84 | +``` |
| 85 | + |
| 86 | +## Success Criteria |
| 87 | + |
| 88 | +1. **✅ Initialize Without Execute**: UpdateRun initializes successfully but waits for start signal |
| 89 | +2. **✅ Start Control**: Setting `Started: true` begins execution from correct stage |
| 90 | +3. **✅ Graceful Stop**: Setting `Started: false` completes current cluster and stops |
| 91 | +4. **✅ Resume Capability**: Restarting continues from exact stopping point |
| 92 | +5. **✅ Proper Conditions**: All condition transitions work correctly |
| 93 | +6. **✅ Backward Compatibility**: Existing UpdateRuns continue to work (default to started) |
| 94 | + |
| 95 | +## Current Status: Phase 1 Complete ✅ |
| 96 | + |
| 97 | +**Completed**: |
| 98 | +- Added `Started *bool` field to UpdateRunSpec with proper kubebuilder validation |
| 99 | +- Added `StagedUpdateRunConditionStarted` condition type with proper documentation |
| 100 | +- Updated kubebuilder printcolumn annotations to include Started condition in kubectl output |
| 101 | +- Updated condition documentation to include "Started" in known conditions list |
| 102 | +- Generated CRDs successfully with new API changes |
| 103 | + |
| 104 | +**Next**: Ready to proceed with Phase 2 - Controller Logic Updates |
0 commit comments