Skip to content

Implement scheduler service for running promotion groups at ScheduledAt time #33

@ScottArbeit

Description

@ScottArbeit

Implement a scheduler service that can execute promotion groups at their configured ScheduledAt time, orchestrating calls into the promotion execution and conflict-detection components.

The foundation PR introduces promotion groups and related model elements, including a scheduling concept (ScheduledAt). The missing piece is a scheduler service that observes scheduled promotion groups and triggers their execution at the right time.

Goals

  • Provide a scheduler (or scheduling façade) that:
    • Monitors promotion groups with a ScheduledAt value in the future.
    • Triggers execution of these groups at or shortly after their scheduled time.
    • Avoids duplicate or overlapping runs for the same promotion group.
    • Integrates cleanly with the rest of the promotion infrastructure.

This issue is only about the scheduling and orchestration layer; conflict detection and ephemeral integration branches are handled in other issues and should be invoked via their public APIs.

Requirements

  1. Scheduling model and states

    • Define or refine the lifecycle states for a scheduled promotion group, for example:
      • PendingScheduledRunning → (Succeeded | Failed | Conflicted | Cancelled).
    • Ensure that:
      • A promotion group with a ScheduledAt time is discoverable by the scheduler.
      • Once picked up for execution, its state prevents duplicate concurrent runs.
  2. Scheduler service design

    • Implement a dedicated scheduler service (e.g., IPromotionScheduler and/or a background worker) responsible for:
      • Periodically polling for promotion groups whose ScheduledAt is due (within a configurable tolerance window).
      • Attempting to transition them into a Running state.
      • Invoking the promotion executor service (from the “ephemeral integration branch” issue) with the correct parameters.
    • The scheduler must be:
      • Safe to run in multiple instances (e.g., across multiple nodes) without causing duplicate executions, via optimistic concurrency or equivalent mechanisms available in the current persistence layer.
      • Configurable in polling frequency and batching behavior (if applicable).
  3. Integration points

    • Use the existing domain/services introduced in PR Add promotion groups, BranchPromotionMode, and conflict-resolution policy (foundation) #28 (and subsequent issues) as the execution engine:
      • Call into the promotion executor service to actually perform the promotion.
      • Optionally perform pre-flight conflict detection before executing, depending on existing APIs and desired behavior.
    • Ensure promotion run results are recorded:
      • Scheduler should persist/update promotion run entities with:
        • Start/end timestamps.
        • Final status (Succeeded, Failed, Conflicted, etc.).
        • Any error or conflict summaries, as provided by the lower-level services.
  4. Error handling and retries

    • Define a simple, explicit retry policy for transient failures (e.g., database connectivity, transient VCS problems):
      • Distinguish between:
        • Permanent failures (e.g., validation errors, persistent conflicts) → mark run as failed/conflicted.
        • Transient failures → retry with backoff or mark as “RetryPending”.
    • Make sure retries do not produce multiple overlapping runs for the same promotion group.
  5. Observability

    • Instrument the scheduler with logging/events for:
      • Discovery of due promotion groups.
      • Attempted state transitions (PendingRunning, etc.).
      • Execution outcomes and durations.
      • Retries and terminal failures.
    • Provide a minimal query surface (if appropriate) so other parts of the system can:
      • Query scheduled/last-run information for promotion groups.
      • Determine why a group did or did not run at a given time.
  6. Configuration

    • Provide configuration hooks for:
      • Poll interval (e.g., every N seconds).
      • Maximum number of promotion groups to process in a batch.
      • Optional “scheduling horizon” (e.g., only consider ScheduledAt within next N hours/minutes).
  7. Testing

    • Add tests that verify:
      • Promotion groups scheduled in the near future eventually get executed.
      • A promotion group is not executed more than once for the same scheduled time, under normal and concurrent scenarios.
      • State transitions are correct and idempotent.
      • Transient failures are retried according to policy, and permanent failures are surfaced correctly.

Deliverables

  • A scheduler service (and any required background infrastructure) that:
    • Finds due promotion groups by ScheduledAt.
    • Triggers their execution via the existing promotion executor APIs.
    • Manages state transitions and avoids duplicate runs.
  • Tests covering:
    • On-time execution of scheduled groups.
    • Non-duplication under concurrent scheduler instances.
    • Correct handling of transient vs. permanent failures.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions