You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- Thanks for sending a pull request! Here are some tips for you: -->
#### What type of PR is this?
Enhancement
#### What this PR does / why we need it
Adds per-pool scheduling metrics to track success/failure outcomes for
each pool independently.
Currently a scheduling failure in one pool causes the entire cycle to
fail with a single error. These metrics enable:
1. Identifying which pool is failing
2. Alerting on specific pool failures
3. Tracking pool health over time
**New metrics:**
- `armada_scheduler_pool_scheduling_outcome` - counter with labels
`pool`, `outcome` (success/failure)
#### Which issue(s) this PR fixes
<!--
*Automatically closes linked issue when PR is merged.
Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`.
_If PR is about `failing-tests or flakes`, please post the related
issues/tests in a comment and do not use `Fixes`_*
-->
Fixes #
#### Special notes for your reviewer
Signed-off-by: Dejan Zele Pejchev <pejcev.dejan@gmail.com>
Copy file name to clipboardExpand all lines: internal/scheduler/scheduling/scheduling_algo.go
+23-6Lines changed: 23 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -97,6 +97,10 @@ func NewFairSchedulingAlgo(
97
97
// It iterates over each executor in turn (using lexicographical order) and assigns the jobs using a LegacyScheduler, before moving onto the next executor.
98
98
// It maintains state of which executors it has considered already and may take multiple Schedule() calls to consider all executors if scheduling is slow.
99
99
// Newly leased jobs are updated as such in the jobDb using the transaction provided and are also returned to the caller.
100
+
//
101
+
// This function must always return a non-nil SchedulerResult, even when returning an error.
102
+
// The result contains PoolSchedulingOutcomes that track which pools succeeded or failed,
103
+
// and callers depend on this for metrics reporting.
0 commit comments