Implement scheduler service for running promotion groups at ScheduledAt time

Implement a scheduler service that can execute promotion groups at their configured `ScheduledAt` time, orchestrating calls into the promotion execution and conflict-detection components.

The foundation PR introduces promotion groups and related model elements, including a scheduling concept (`ScheduledAt`). The missing piece is a **scheduler service** that observes scheduled promotion groups and triggers their execution at the right time.

#### Goals

- Provide a scheduler (or scheduling façade) that:
  - Monitors promotion groups with a `ScheduledAt` value in the future.
  - Triggers execution of these groups at or shortly after their scheduled time.
  - Avoids duplicate or overlapping runs for the same promotion group.
  - Integrates cleanly with the rest of the promotion infrastructure.

This issue is only about the scheduling and orchestration layer; conflict detection and ephemeral integration branches are handled in other issues and should be invoked via their public APIs.

#### Requirements

1. **Scheduling model and states**
   - Define or refine the lifecycle states for a scheduled promotion group, for example:
     - `Pending` → `Scheduled` → `Running` → (`Succeeded` | `Failed` | `Conflicted` | `Cancelled`).
   - Ensure that:
     - A promotion group with a `ScheduledAt` time is discoverable by the scheduler.
     - Once picked up for execution, its state prevents duplicate concurrent runs.

2. **Scheduler service design**
   - Implement a dedicated scheduler service (e.g., `IPromotionScheduler` and/or a background worker) responsible for:
     - Periodically polling for promotion groups whose `ScheduledAt` is due (within a configurable tolerance window).
     - Attempting to transition them into a `Running` state.
     - Invoking the promotion executor service (from the “ephemeral integration branch” issue) with the correct parameters.
   - The scheduler must be:
     - Safe to run in multiple instances (e.g., across multiple nodes) without causing duplicate executions, via optimistic concurrency or equivalent mechanisms available in the current persistence layer.
     - Configurable in polling frequency and batching behavior (if applicable).

3. **Integration points**
   - Use the existing domain/services introduced in PR #28 (and subsequent issues) as the execution engine:
     - Call into the promotion executor service to actually perform the promotion.
     - Optionally perform pre-flight conflict detection before executing, depending on existing APIs and desired behavior.
   - Ensure promotion run results are recorded:
     - Scheduler should persist/update promotion run entities with:
       - Start/end timestamps.
       - Final status (`Succeeded`, `Failed`, `Conflicted`, etc.).
       - Any error or conflict summaries, as provided by the lower-level services.

4. **Error handling and retries**
   - Define a simple, explicit retry policy for transient failures (e.g., database connectivity, transient VCS problems):
     - Distinguish between:
       - Permanent failures (e.g., validation errors, persistent conflicts) → mark run as failed/conflicted.
       - Transient failures → retry with backoff or mark as “RetryPending”.
   - Make sure retries do not produce multiple overlapping runs for the same promotion group.

5. **Observability**
   - Instrument the scheduler with logging/events for:
     - Discovery of due promotion groups.
     - Attempted state transitions (`Pending` → `Running`, etc.).
     - Execution outcomes and durations.
     - Retries and terminal failures.
   - Provide a minimal query surface (if appropriate) so other parts of the system can:
     - Query scheduled/last-run information for promotion groups.
     - Determine why a group did or did not run at a given time.

6. **Configuration**
   - Provide configuration hooks for:
     - Poll interval (e.g., every N seconds).
     - Maximum number of promotion groups to process in a batch.
     - Optional “scheduling horizon” (e.g., only consider `ScheduledAt` within next N hours/minutes).

7. **Testing**
   - Add tests that verify:
     - Promotion groups scheduled in the near future eventually get executed.
     - A promotion group is not executed more than once for the same scheduled time, under normal and concurrent scenarios.
     - State transitions are correct and idempotent.
     - Transient failures are retried according to policy, and permanent failures are surfaced correctly.

#### Deliverables

- A scheduler service (and any required background infrastructure) that:
  - Finds due promotion groups by `ScheduledAt`.
  - Triggers their execution via the existing promotion executor APIs.
  - Manages state transitions and avoids duplicate runs.
- Tests covering:
  - On-time execution of scheduled groups.
  - Non-duplication under concurrent scheduler instances.
  - Correct handling of transient vs. permanent failures.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement scheduler service for running promotion groups at ScheduledAt time #33

Goals

Requirements

Deliverables

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Implement scheduler service for running promotion groups at ScheduledAt time #33

Description

Goals

Requirements

Deliverables

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions