Commit 25ac879
committed
fix(placement): prevent thundering herd with optimistic resource accounting and shadow state
High concurrency placement requests were causing severe "thundering herd" issues due to stale node metrics. The orchestrator would continuously schedule multiple sandboxes on the same seemingly "empty" node before it could report updated resource usage (the "invisibility gap").
This commit introduces active load prediction and optimistic resource reservation to ensure perfectly balanced placement even during metric reporting intervals.
Changes:
- fix(placement): factor `InProgress` pending resources into the `BestOfK` scoring calculation to predict expected load.
- fix(nodemanager): implement `OptimisticAdd` to immediately reserve resources upon successful placement, bridging the gap before async metric updates arrive.
- test(placement): refactor `SimulatedNode` into a `NodeSimulator` interface to support diverse node behavior simulations.
- test(placement): introduce `LaggyNode` to simulate real-world scenarios with stale/delayed node metrics.
- test(placement): add `BenchmarkPlacementDistribution` to visualize load distribution and verify the elimination of the thundering herd effect under high concurrency.
Signed-off-by: MorningTZH <morningtzh@yeah.net>1 parent 1561b22 commit 25ac879
File tree
4 files changed
+422
-96
lines changed- packages/api/internal/orchestrator
- nodemanager
- placement
4 files changed
+422
-96
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
177 | 177 | | |
178 | 178 | | |
179 | 179 | | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
79 | 79 | | |
80 | 80 | | |
81 | 81 | | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
82 | 90 | | |
83 | 91 | | |
84 | 92 | | |
| |||
0 commit comments