You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: declarative-kubernetes-cluster-emulation-with-kemu/index.html
+6-6Lines changed: 6 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -70,7 +70,7 @@
70
70
name="description"
71
71
content="
72
72
73
-
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
73
+
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
<metaproperty="og:title" content="KEMU: A Declarative Approach to Emulating Kubernetes Clusters at Scale">
100
-
<metaproperty="og:description" content="Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.">
100
+
<metaproperty="og:description" content="Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.">
101
101
<metaproperty="og:locale" content="en">
102
102
<metaproperty="og:type" content="article">
103
103
<metaproperty="article:section" content="posts">
@@ -112,7 +112,7 @@
112
112
113
113
<metaname="twitter:card" content="summary">
114
114
<metaname="twitter:title" content="KEMU: A Declarative Approach to Emulating Kubernetes Clusters at Scale">
115
-
<metaname="twitter:description" content="Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.">
115
+
<metaname="twitter:description" content="Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.">
116
116
117
117
118
118
<scripttype="application/ld+json">
@@ -123,7 +123,7 @@
123
123
"name": "KEMU: A Declarative Approach to Emulating Kubernetes Clusters at Scale",
124
124
"headline": "KEMU: A Declarative Approach to Emulating Kubernetes Clusters at Scale",
125
125
126
-
"abstract": "Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.",
126
+
"abstract": "Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.",
<h2id="requirements" class="relative group">Requirements <spanclass="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100"><aclass="group-hover:text-primary-300 dark:group-hover:text-neutral-700" style="text-decoration-line: none !important;" href="#requirements" aria-label="Anchor">#</a></span></h2><p>Let’s consider the following cluster setup to provide background for the functionality of the emulated cluster:</p>
468
468
<ul>
469
469
<li>A Kubernetes cluster with 1,000+ GPU nodes of different types;</li>
470
-
<li>The nodes are spread across several data centers/availability zones;</li>
470
+
<li>The nodes are spread across multiple topology domains (availability zones, racks);</li>
471
471
<li>Specialized scheduling and training operators are running on the cluster;</li>
472
472
<li>Observability is provided via the Prometheus stack.</li>
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
495
+
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
17
+
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
18
18
</item>
19
19
<item>
20
20
<title>Secure Kubeflow Ingress and Authentication with Istio External Auth, Dex, and OAuth2 Proxy</title>
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
484
+
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
17
+
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
18
18
</item>
19
19
<item>
20
20
<title>Secure Kubeflow Ingress and Authentication with Istio External Auth, Dex, and OAuth2 Proxy</title>
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
408
+
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
17
+
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
408
+
Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky — configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
17
+
<description>Optimizing AI workload scheduling requires extensive experimentation and observation, but testing scheduler modifications in production is risky - configuration errors can cause multi-day delays and wasted capacity. This post introduces KEMU, a declarative Kubernetes Emulator Utility that replaces fragmented multi-tool cluster setups with a single configuration specification, enabling safe experimentation with large-scale GPU clusters on minimal resources.</description>
0 commit comments