KEP-766: DisaggregatedSet implementation by hasB4K · Pull Request #773 · kubernetes-sigs/lws

hasB4K · 2026-03-09T12:22:54Z

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it

This PR implements KEP-766 by introducing the DisaggregatedSet controller, a Kubernetes operator for managing disaggregated inference deployments. Disaggregated serving separates the prefill and decode phases
of LLM inference onto different infrastructure, and this controller orchestrates multiple LeaderWorkerSets with coordinated lifecycle management.

Key features:

Unified Management: Manage prefill and decode LeaderWorkerSets as a single resource
Two-Dimensional Rolling Updates: Linear interpolation algorithm updates both sides in lockstep while preserving the prefill-to-decode ratio
Service Orchestration: Automatically create Services when both sides are ready
Stateless Operator: Safe to restart at any point during operations

Which issue(s) this PR fixes

Fixes #766

Special notes for your reviewer

The two-dimensional rollout algorithm uses linear interpolation to maintain the prefill/decode ratio throughout updates. A plan-steps CLI tool is included to visualize rollout plans:

$ go run ./cmd/plan-steps --source '{"prefill": 10, "decode": 2}' --target '{"prefill": 6, "decode": 8}' --surge '{"prefill": 2, "decode": 2}'
Phases: [decode prefill]
Source: decode=2, prefill=10
Target: decode=8, prefill=6
Config: decode(surge=2, unavail=0), prefill(surge=2, unavail=0)

┌──────┬────────────┬─────────────┬────────────┬─────────────┬───────┬───────────────────────────────┐
│ STEP │ OLD DECODE │ OLD PREFILL │ NEW DECODE │ NEW PREFILL │ TOTAL │            ACTION             │
├──────┼────────────┼─────────────┼────────────┼─────────────┼───────┼───────────────────────────────┤
│ 0    │ 2          │ 10          │ 0          │ 0           │ 12    │ initial                       │
│ 1    │ 2          │ 10          │ 2          │ 2           │ 16    │ new decode +2, new prefill +2 │
│ 2    │ 2          │ 8           │ 2          │ 2           │ 14    │ old prefill -2                │
│ 3    │ 2          │ 8           │ 4          │ 3           │ 17    │ new decode +2, new prefill +1 │
│ 4    │ 2          │ 8           │ 5          │ 4           │ 19    │ new decode +1, new prefill +1 │
│ 5    │ 2          │ 6           │ 5          │ 4           │ 17    │ old prefill -2                │
│ 6    │ 2          │ 6           │ 7          │ 5           │ 20    │ new decode +2, new prefill +1 │
│ 7    │ 2          │ 6           │ 8          │ 6           │ 22    │ new decode +1, new prefill +1 │
│ 8    │ 0          │ 0           │ 8          │ 6           │ 14    │ old decode -2, old prefill -6 │
└──────┴────────────┴─────────────┴────────────┴─────────────┴───────┴───────────────────────────────┘

Does this PR introduce a user-facing change?

Introduce DisaggregatedSet API for managing disaggregated LLM inference deployments with coordinated prefill/decode rolling updates.

Current TODO:

List based instead of map based phases in the YAML config
(that was optional, but realized it was easy with the previous point done): support multiple phases (ie. more than 2) to be able to support encode phase (among other)
Remove the Service in the YAML API. We now have a Headless Service that exposes a EndpointSlice (for llm-d EndpointPicker in the future)
Massive refactor of the e2e tests.
Propagate the annotations to the LWS level to support Kueue topology
See if I can use LeaderWorkerSetSpec in the YAML instead of LeaderWorkerTemplate as I do now.
Add the CI
Update the KEP and the doc.

k8s-ci-robot · 2026-03-09T12:23:01Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hasB4K
Once this PR has been reviewed and has the lgtm label, please assign ahg-g for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-03-09T12:23:04Z

Hi @hasB4K. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2026-03-09T12:23:26Z

✅ Deploy Preview for kubernetes-sigs-lws canceled.

Name	Link
🔨 Latest commit	`e1be1ea`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-lws/deploys/69bb47ed8969d20009384365

yankay · 2026-03-09T12:59:18Z

/ok-to-test

disaggregatedset/internal/controller/workload_manager.go

ahg-g

A few comments on the api

disaggregatedset/api/v1alpha1/disaggregatedset_types.go

ahg-g · 2026-03-11T12:26:30Z

disaggregatedset/api/v1alpha1/disaggregatedset_types.go

+	// +kubebuilder:validation:Minimum=0
+	// +kubebuilder:default=1
+	// +optional
+	Replicas *int32 `json:"replicas,omitempty"`


I am sure you thought you about this, but what is the reason for not using a LeaderWorkerSetSpec instead of explicitly listing the individual parameters?

@ahg-g, @Edwinhr716 :

I remember why I took the decision of not using LeaderWorkerSetSpec.
DisaggregatedPhaseSpec does not include rolloutStrategy from LeaderWorkerSetSpec because the DisaggregatedSet operator handles rolling updates itself:

Rollout type: We don't want users to pick a rollout strategy (RollingUpdate, OnDelete, etc.). The operator has its own logic to roll all phases together.
Partition: LWS uses partition to roll pods one by one, but this only works inside a single LWS. We can't sync partition across prefill and decode, so pods would end up on different versions. (this I didn't know)

Instead, the operator creates new LWS resources for each revision and scales them up/down together. Services only point to revisions where all phases are ready.
I think the API should be opinionated, and this is typically one of this case when I don't want to expose types (like RollouType or partition) that doesn't makes sense in the DisaggregatedSet operator

Here is one option: we can use the LeaderWorkerSetSpec and validate that only the supported fields are set. The advantage is that this way the apis are always in sync (we don't need to copy every field), but at the same time we control which fields the new api should support. What do you think?

Okay fair. I can try to do that, I'll let you know.

disaggregatedset/api/v1alpha1/disaggregatedset_types.go

hasB4K · 2026-03-16T14:52:30Z

Here what has been achieved since our last sync:

List based instead of map based phases in the YAML config
(that was optional, but realized it was easy with the previous point done): support multiple phases (ie. more than 2) to be able to support encode phase (among other)
Remove the Service in the YAML API. We now have a Headless Service that exposes a EndpointSlice (for llm-d EndpointPicker in the future)
Massive refactor of the e2e tests.
Patched a bug in the planner.

Here what i still need to do:

Propagate the annotations to the LWS level to support Kueue topology
See if I can use LeaderWorkerSetSpec in the YAML instead of LeaderWorkerTemplate as I do now.
Add the CI
Update the KEP and the doc.

(I added a global TODO in the head message)

cc @ahg-g, @Edwinhr716

This adds the reference implementation for KEP-766 DisaggregatedSet. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

…Service Signed-off-by: Mathis Felardos <mathis@mistral.ai>

…ss portless Services This change simplifies the DisaggregatedSet API by removing the user-facing ServiceTemplate field and automatically creating headless portless Services for pod discovery via EndpointSlices. API Changes: - Remove ServiceTemplate from DisaggregatedSet spec - Services are now created automatically without user configuration Service Behavior: - Headless Services (clusterIP: None) are created for each side (prefill/decode) - Services are portless to enable EndpointSlice-based discovery - Services are only created when both sides have ReadyReplicas >= 1 - Old services are cleaned up only after workloads are fully drained Testing: - Add e2e test verifying automatic Service and EndpointSlice creation - Add Kind cluster cleanup in AfterSuite to prevent image caching issues - Fix .dockerignore to use explicit exclusions Also fixes typo in Makefile helm target path. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

…ut codebase This is a breaking change that standardizes terminology: - Label: disaggregatedset.x-k8s.io/side → disaggregatedset.x-k8s.io/phase - Type: DisaggSideConfig → DisaggregatedPhaseSpec - Constants: SidePrefill/SideDecode → PhasePrefill/PhaseDecode - Struct: SideReplicaState → PhaseReplicaState - Map field: Sides → Phases YAML field names (spec.prefill, spec.decode) remain unchanged. Updated files: - API types and CRD manifests - All controller code and tests - E2E tests with updated label selectors - CLI tool (plan-steps) - README documentation - Helm chart CRD Signed-off-by: Mathis Felardos <mathis@mistral.ai>

… flexible phases array This commit introduces a flexible N-phase architecture for DisaggregatedSet, replacing the hardcoded prefill/decode fields with a dynamic phases array. API Changes: - Replace Prefill/Decode pointer fields with Phases []DisaggregatedPhaseSpec - Add CEL validation requiring >= 2 phases with unique names - Update PhaseReplicaState to use dynamic slices instead of fixed arrays Controller Changes: - Refactor executor to use dynamic slices for N-phase support - Update planner to work with arbitrary phase counts - Update service manager for N-phase workloads CLI Changes (plan-steps): - Accept JSON maps for phase config: --source '{"prefill": 6, "decode": 2}' - Dynamically generate table headers from phase names - Fully N-phase ready (planner still has 2-phase limitation) Testing: - Add 3-phase rolling update e2e test - Add phase rename e2e test - Update all existing tests for new API This is a breaking API change for v1alpha1. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

… for phase changes Introduces a phasePolicy field (Strict/Flexible) to control how DisaggregatedSet handles phase additions, removals, and renames during updates. API Changes: - Add PhasePolicy type with Strict (default) and Flexible values - Strict: rejects adding, removing, or renaming phases - Flexible: allows phase changes with progressive rollout Controller Changes: - Detect phase changes by comparing spec vs old workload phases - Pass removed phases to Planner with target=0 for progressive drain - Compute allPhaseNames as union of spec phases + old workload phases - Clean up ALL phases belonging to drained revisions, not just spec phases - Refactor executor.go into logical sections with focused helper functions - Move verbose investigation logs to V(1) debug level Testing: - Add unit tests for phasePolicy enforcement (Strict blocks, Flexible allows) - Update e2e tests for progressive drain behavior - Add 3-phase sample manifests File reduced from 658 lines to 507 lines while maintaining all functionality. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

…r and fixtures packages Introduces reusable test utilities to simplify e2e test code and improve maintainability. New Packages: - test/utils/kubectl: Fluent builder API for kubectl commands - kubectl.go: Chainable methods (Get, Delete, Apply, Label, JSONPath, etc.) - queries.go: Higher-level helpers (LWSByPhase, CountPods, GetTotalReplicas) - waiters.go: Eventually-based wait helpers (ForPodCount, ForRevisionDrained) - test/utils/fixtures: YAML builders for DisaggregatedSet manifests - fixtures.Phase and fixtures.Config for flexible YAML generation - fixtures.PrefillDecode helper for common 2-phase configs Test Refactoring: - Replace inline helpers with kubectl package imports - Use fluent builder for all kubectl operations - Consolidate YAML builders into fixtures package - Add TrackProgressiveRollout helper for rollout tests Code Reduction: - executor.go: 507 → 456 lines - e2e_test.go: 1669 → 1004 lines (~40% reduction) Signed-off-by: Mathis Felardos <mathis@mistral.ai>

… and maxUnavailable floor Port fixes from commit 7c66541 to the N-phase planner: 1. Source-aware surge baseline: - Change surge constraint from old + new <= target + surge to old + new <= max(source, target) + surge - Ensures scale-down scenarios don't block scale-up since the system already runs source replicas 2. maxUnavailable floor enforcement: - Add minOld constraint: old + new >= target - maxUnavailable - Only enforce when source >= target (scale-down scenario) - Applied to both proportional drain and fallback drain paths Test updates: - Update expected sequences for scale-down and mixed-scale scenarios - Add asymmetric_5_3_surge2 test case All tests pass with 90.2% coverage. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

disaggregatedset/api/v1alpha1/disaggregatedset_types.go

ahg-g · 2026-03-16T16:18:08Z

disaggregatedset/hack/plan-steps/main.go

+//	  --surge '{"prefill": 2, "decode": 2}'
+//
+// This helps understand what will happen during a specific rollout in advance.
+// Supports arbitrary phase names and will support N phases when the planner does.


So this is a tool that one runs locally, not a controller?

So this is a tool that one runs locally, not a controller?

cmd/plan-steps/main.go is just a util script that I used for debug. it can also be used to know what will happen before one does a deployment. BUT the rest of the code in disaggregatedset/internal/ is the controller/operator. 😉

I moved this to hack/plan-steps/ - I think it's clearer that way.

The existing test only verified that labels and annotations were set on the LWS workerTemplate. This extends the test to also verify that they are actually propagated to the running pods by LWS. Adds PodsByPhase helper to kubectl queries for querying pods by phase. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

…gy support Adds a metadata field to DisaggregatedPhaseSpec that allows users to set labels and annotations on the LWS CR's ObjectMeta. This enables: - Kueue queue assignment (kueue.x-k8s.io/queue-name label) - LWS exclusive-topology scheduling User-provided labels/annotations are merged onto the LWS CR, with system labels taking precedence over user labels. Includes e2e test verifying metadata propagation to LWS CR. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

…est-e2e targets for prow CI Signed-off-by: Mathis Felardos <mathis@mistral.ai>

BREAKING CHANGE: The phasePolicy field has been removed from DisaggregatedSetSpec. The controller now always uses Flexible behavior, allowing phase additions, removals, and renames during rollouts. - Remove PhasePolicy type, constants, and spec field - Remove getPhasePolicy() and rejectPhaseChanges() controller logic - Remove all PhasePolicy unit tests - Update fixtures and sample YAMLs - Remove PhasePolicy from e2e tests Phase changes are now always allowed (Flexible behavior is the default). Signed-off-by: Mathis Felardos <mathis@mistral.ai>

- Add KIND ?= kind default in disaggregatedset Makefile - Pass KIND=$(KIND) from parent Makefile to disaggregatedset test-e2e This fixes the "Kind is not installed" error when running disaggregatedset e2e tests from the parent LWS Makefile. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

Add a dedicated e2e test script that mirrors the LWS pattern for running e2e tests in Prow CI. The script handles the full e2e lifecycle: - Kind cluster creation/deletion with cleanup trap - Docker image building and loading to Kind - LWS controller installation from release manifests - DisaggregatedSet operator deployment via kustomize - Running ginkgo tests with junit output - Log collection on cleanup for debugging Update Makefile to call the hack script with all required environment variables (KIND, KUBECTL, KUSTOMIZE, GINKGO, etc.) and add ginkgo tool. Update e2e tests to skip redundant operations when run via hack script (detected via LWS_INSTALL_SKIP=true), allowing tests to work both standalone and in Prow CI.

…seSpec

…eNextStep - Change RollingUpdateConfig from slice fields to per-phase structs ([]RollingUpdateConfig instead of RollingUpdateConfig with []int fields) - Simplify extractRollingUpdateConfig by removing unused specPhaseSet param - Split ComputeNextStep into focused helpers: isComplete, isNewAtTarget, canScaleUp, computeMinOld, tryScaleUp, tryProportionalDrain, tryForceDrain - Remove config() test helper in favor of direct struct literals

ahg-g

The API looks good to me. I will review the code when it is merged with the lws binary (Can we create an issue to track this btw?). I will leave the final approval to Edwin.

/lgtm

Edwinhr716 · 2026-03-20T00:52:18Z

I took a look at the code, nothing major sticks out. Will approve once the last comment is addressed.

/lgtm

Edwinhr716 · 2026-03-20T00:32:23Z

disaggregatedset/internal/controller/disaggregatedset_controller.go

+
+	log.Info("Reconciling DisaggregatedSet", "name", disaggregatedSet.Name, "namespace", disaggregatedSet.Namespace)
+
+	// Validate phases are configured (API validation ensures at least 2)


Can we move all validation to the disaggregatedset_webhook?

k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API labels Mar 9, 2026

k8s-ci-robot requested review from Edwinhr716 and ardaguclu March 9, 2026 12:23

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 9, 2026

k8s-ci-robot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 9, 2026

hasB4K force-pushed the mistral-ai/disaggregatedset branch from df7d25d to 7620d60 Compare March 9, 2026 12:24

hasB4K mentioned this pull request Mar 9, 2026

KEP-766: Add DisaggregatedSet proposal #767

Open

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 9, 2026

hasB4K changed the title ~~KEP-766: Add DisaggDeployment proposal~~ KEP-766: Add DisaggregatedSet proposal Mar 9, 2026

hasB4K changed the title ~~KEP-766: Add DisaggregatedSet proposal~~ KEP-766: DisaggregatedSet implementation Mar 9, 2026

Edwinhr716 reviewed Mar 10, 2026

View reviewed changes

disaggregatedset/internal/controller/workload_manager.go Outdated Show resolved Hide resolved

ahg-g reviewed Mar 11, 2026

View reviewed changes

hasB4K force-pushed the mistral-ai/disaggregatedset branch from 7c66541 to ddb0070 Compare March 16, 2026 14:46

hasB4K force-pushed the mistral-ai/disaggregatedset branch from ddb0070 to 358e48c Compare March 16, 2026 14:58

hasB4K added 8 commits March 16, 2026 16:25

KEP-766: Add DisaggregatedSet controller implementation

73ce834

This adds the reference implementation for KEP-766 DisaggregatedSet. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

feat(disaggregatedset): propagate user labels/annotations to LWS and …

33e4098

…Service Signed-off-by: Mathis Felardos <mathis@mistral.ai>

hasB4K force-pushed the mistral-ai/disaggregatedset branch from 358e48c to 233b6ad Compare March 16, 2026 15:25

hasB4K requested review from Edwinhr716 and ahg-g March 16, 2026 19:03

ahg-g reviewed Mar 16, 2026

View reviewed changes

hasB4K added 5 commits March 17, 2026 10:52

feat(disaggregatedset): Add disaggregatedset tests to make test and t…

b81f2e5

…est-e2e targets for prow CI Signed-off-by: Mathis Felardos <mathis@mistral.ai>

hasB4K force-pushed the mistral-ai/disaggregatedset branch 2 times, most recently from 7bc4172 to 394f4b8 Compare March 17, 2026 10:56

hasB4K force-pushed the mistral-ai/disaggregatedset branch from 394f4b8 to 89cc3e9 Compare March 17, 2026 11:13

hasB4K requested a review from ahg-g March 17, 2026 13:18

hasB4K added 3 commits March 19, 2026 01:47

feat(disaggregatedset): Embed LeaderWorkerSetSpec in DisaggregatedPha…

54bc9e7

…seSpec

chore(disaggregatedset): Move plan-steps utility to hack/

e1be1ea

hasB4K force-pushed the mistral-ai/disaggregatedset branch from 5a2367a to e1be1ea Compare March 19, 2026 00:48

ahg-g reviewed Mar 19, 2026

View reviewed changes

k8s-ci-robot assigned ahg-g Mar 19, 2026

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 19, 2026

k8s-ci-robot assigned Edwinhr716 Mar 20, 2026

Edwinhr716 reviewed Mar 20, 2026

View reviewed changes


		log.Info("Reconciling DisaggregatedSet", "name", disaggregatedSet.Name, "namespace", disaggregatedSet.Namespace)

		// Validate phases are configured (API validation ensures at least 2)

Conversation

hasB4K commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it

Key features:

Which issue(s) this PR fixes

Special notes for your reviewer

Does this PR introduce a user-facing change?

Current TODO:

Uh oh!

k8s-ci-robot commented Mar 9, 2026

Uh oh!

k8s-ci-robot commented Mar 9, 2026

Uh oh!

netlify bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-lws canceled.

Uh oh!

yankay commented Mar 9, 2026

Uh oh!

Uh oh!

ahg-g left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hasB4K commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hasB4K Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g left a comment

Choose a reason for hiding this comment

Uh oh!

Edwinhr716 commented Mar 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hasB4K commented Mar 9, 2026 •

edited

Loading

netlify bot commented Mar 9, 2026 •

edited

Loading

ahg-g Mar 17, 2026 •

edited

Loading

hasB4K commented Mar 16, 2026 •

edited

Loading

hasB4K Mar 16, 2026 •

edited

Loading