Skip to content

Commit 1b49d73

Browse files
committed
site update Wed Nov 5 00:28:14 PST 2025
1 parent 4067829 commit 1b49d73

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

declarative-kubernetes-cluster-emulation-with-kemu/index.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@
139139
"keywords": ["kubernetes","emulation","kind","kwok","kemu"],
140140

141141
"mainEntityOfPage": "true",
142-
"wordCount": "2568"
142+
"wordCount": "2577"
143143
}
144144
</script>
145145

@@ -456,7 +456,7 @@ <h1 class="mb-8 mt-0 text-4xl font-extrabold text-neutral-900 dark:text-neutral"
456456
<p>Maximizing infrastructure efficiency through optimized scheduling and high-density bin packing is
457457
critical for achieving high workload throughput and resource utilization. Introducing scheduler
458458
modifications at production scale is risky: configuration errors can render workloads unschedulable,
459-
causing multi-day delays for dozens or hundreds of AI practitioners, idling capacity, wasted on-demand
459+
causing multi-day delays for dozens or hundreds of ML engineers, idling capacity, wasted on-demand
460460
resources, and delayed delivery. It is imperative that scheduler modifications with a big blast
461461
radius are tested and verified in a safe environment before shipping to production.</p>
462462
<p>This problem is not new, and virtual/emulated clusters for scheduling experimentation and workload
@@ -1103,9 +1103,9 @@ <h3 id="suboptimal-scheduling-example" class="relative group">Suboptimal schedul
11031103
<p>From the dashboard, we can see that while all jobs are in the <code>Running</code> state, there are pods
11041104
in the <code>Pending</code> state while unallocated capacity remains available.</p>
11051105
<p>What we&rsquo;re observing is partial admission - a common scheduling problem in Kubernetes
1106-
clusters. Pods created by jobs are not scheduled in batch but rather one-by-one
1107-
resulting in fragmentation and inability to schedule all pods even if there is
1108-
cluster capacity available.</p>
1106+
clusters. The Kubernetes Scheduler processes pods individually without job-level awareness.
1107+
Pods created by jobs are not scheduled in batch but rather one-by-one resulting in
1108+
fragmentation and inability to schedule all pods even if there is cluster capacity available.</p>
11091109
<p>For distributed training workloads that require all workers to start simultaneously
11101110
(all-or-nothing semantics), this partial admission creates a deadlock situation.
11111111
This simple experiment reveals a scheduling issue that would be expensive to discover

index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)