-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Implement a scheduler service that can execute promotion groups at their configured ScheduledAt time, orchestrating calls into the promotion execution and conflict-detection components.
The foundation PR introduces promotion groups and related model elements, including a scheduling concept (ScheduledAt). The missing piece is a scheduler service that observes scheduled promotion groups and triggers their execution at the right time.
Goals
- Provide a scheduler (or scheduling façade) that:
- Monitors promotion groups with a
ScheduledAtvalue in the future. - Triggers execution of these groups at or shortly after their scheduled time.
- Avoids duplicate or overlapping runs for the same promotion group.
- Integrates cleanly with the rest of the promotion infrastructure.
- Monitors promotion groups with a
This issue is only about the scheduling and orchestration layer; conflict detection and ephemeral integration branches are handled in other issues and should be invoked via their public APIs.
Requirements
-
Scheduling model and states
- Define or refine the lifecycle states for a scheduled promotion group, for example:
Pending→Scheduled→Running→ (Succeeded|Failed|Conflicted|Cancelled).
- Ensure that:
- A promotion group with a
ScheduledAttime is discoverable by the scheduler. - Once picked up for execution, its state prevents duplicate concurrent runs.
- A promotion group with a
- Define or refine the lifecycle states for a scheduled promotion group, for example:
-
Scheduler service design
- Implement a dedicated scheduler service (e.g.,
IPromotionSchedulerand/or a background worker) responsible for:- Periodically polling for promotion groups whose
ScheduledAtis due (within a configurable tolerance window). - Attempting to transition them into a
Runningstate. - Invoking the promotion executor service (from the “ephemeral integration branch” issue) with the correct parameters.
- Periodically polling for promotion groups whose
- The scheduler must be:
- Safe to run in multiple instances (e.g., across multiple nodes) without causing duplicate executions, via optimistic concurrency or equivalent mechanisms available in the current persistence layer.
- Configurable in polling frequency and batching behavior (if applicable).
- Implement a dedicated scheduler service (e.g.,
-
Integration points
- Use the existing domain/services introduced in PR Add promotion groups, BranchPromotionMode, and conflict-resolution policy (foundation) #28 (and subsequent issues) as the execution engine:
- Call into the promotion executor service to actually perform the promotion.
- Optionally perform pre-flight conflict detection before executing, depending on existing APIs and desired behavior.
- Ensure promotion run results are recorded:
- Scheduler should persist/update promotion run entities with:
- Start/end timestamps.
- Final status (
Succeeded,Failed,Conflicted, etc.). - Any error or conflict summaries, as provided by the lower-level services.
- Scheduler should persist/update promotion run entities with:
- Use the existing domain/services introduced in PR Add promotion groups, BranchPromotionMode, and conflict-resolution policy (foundation) #28 (and subsequent issues) as the execution engine:
-
Error handling and retries
- Define a simple, explicit retry policy for transient failures (e.g., database connectivity, transient VCS problems):
- Distinguish between:
- Permanent failures (e.g., validation errors, persistent conflicts) → mark run as failed/conflicted.
- Transient failures → retry with backoff or mark as “RetryPending”.
- Distinguish between:
- Make sure retries do not produce multiple overlapping runs for the same promotion group.
- Define a simple, explicit retry policy for transient failures (e.g., database connectivity, transient VCS problems):
-
Observability
- Instrument the scheduler with logging/events for:
- Discovery of due promotion groups.
- Attempted state transitions (
Pending→Running, etc.). - Execution outcomes and durations.
- Retries and terminal failures.
- Provide a minimal query surface (if appropriate) so other parts of the system can:
- Query scheduled/last-run information for promotion groups.
- Determine why a group did or did not run at a given time.
- Instrument the scheduler with logging/events for:
-
Configuration
- Provide configuration hooks for:
- Poll interval (e.g., every N seconds).
- Maximum number of promotion groups to process in a batch.
- Optional “scheduling horizon” (e.g., only consider
ScheduledAtwithin next N hours/minutes).
- Provide configuration hooks for:
-
Testing
- Add tests that verify:
- Promotion groups scheduled in the near future eventually get executed.
- A promotion group is not executed more than once for the same scheduled time, under normal and concurrent scenarios.
- State transitions are correct and idempotent.
- Transient failures are retried according to policy, and permanent failures are surfaced correctly.
- Add tests that verify:
Deliverables
- A scheduler service (and any required background infrastructure) that:
- Finds due promotion groups by
ScheduledAt. - Triggers their execution via the existing promotion executor APIs.
- Manages state transitions and avoids duplicate runs.
- Finds due promotion groups by
- Tests covering:
- On-time execution of scheduled groups.
- Non-duplication under concurrent scheduler instances.
- Correct handling of transient vs. permanent failures.