KEP-766: Add DisaggregatedSet proposal by hasB4K · Pull Request #767 · kubernetes-sigs/lws

hasB4K · 2026-03-05T17:01:53Z

This KEP proposes adding DisaggDeployment as a higher-level API for managing disaggregated inference workloads. DisaggDeployment orchestrates multiple LeaderWorkerSets (prefill and decode) with coordinated lifecycle management, including two-dimensional rolling updates and service orchestration.

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it

This KEP proposes adding DisaggDeployment as a higher-level API for managing disaggregated inference workloads. DisaggDeployment orchestrates multiple LeaderWorkerSets (prefill and decode) with coordinated lifecycle management, including two-dimensional rolling updates and service orchestration.

Fixes #766

Special notes for your reviewer

As discussed offline, the implementation of this KEP already exists, and will be pushed tomorrow. I will change my implementation based on the review of this KEP and the review of the coming implementation.

Does this PR introduce a user-facing change?

Yes and no. It's a new API.

k8s-ci-robot · 2026-03-05T17:01:58Z

@hasB4K: The label(s) kind//kind cannot be applied, because the repository doesn't have them.

Details

In response to this:

This KEP proposes adding DisaggDeployment as a higher-level API for managing disaggregated inference workloads. DisaggDeployment orchestrates multiple LeaderWorkerSets (prefill and decode) with coordinated lifecycle management, including two-dimensional rolling updates and service orchestration.

What type of PR is this?

/kind feature /kind api-change

What this PR does / why we need it

This KEP proposes adding DisaggDeployment as a higher-level API for managing disaggregated inference workloads. DisaggDeployment orchestrates multiple LeaderWorkerSets (prefill and decode) with coordinated lifecycle management, including two-dimensional rolling updates and service orchestration.

Fixes #766

Special notes for your reviewer

As discussed offline, the implementation of this KEP already exists, and will be pushed tomorrow. I will change my implementation based on the review of this KEP and the review of the coming implementation.

Does this PR introduce a user-facing change?

Yes and no. It's a new API.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

linux-foundation-easycla · 2026-03-05T17:02:02Z

The committers listed above are authorized under a signed CLA.

✅ login: hasB4K / name: Mathis Felardos (82ec06b, 95941ed, a6d7172, f9718f4, fcc3b9c)

k8s-ci-robot · 2026-03-05T17:02:03Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: hasB4K
Once this PR has been reviewed and has the lgtm label, please assign edwinhr716 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2026-03-05T17:02:03Z

Welcome @hasB4K!

It looks like this is your first PR to kubernetes-sigs/lws 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/lws has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

netlify · 2026-03-05T17:02:03Z

✅ Deploy Preview for kubernetes-sigs-lws ready!

Name	Link
🔨 Latest commit	`f9718f4`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-lws/deploys/69ae95647eb31500092a641a
😎 Deploy Preview	https://deploy-preview-767--kubernetes-sigs-lws.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

k8s-ci-robot · 2026-03-05T17:02:04Z

Hi @hasB4K. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Edwinhr716 · 2026-03-05T17:15:25Z

/ok-to-test

Edwinhr716 · 2026-03-05T17:51:39Z

Can you run this script https://github.com/kubernetes-sigs/lws/blob/main/hack/update-toc.sh to fix the formatting in the KEP?

This KEP proposes adding DisaggDeployment as a higher-level API for managing disaggregated inference workloads. DisaggDeployment orchestrates multiple LeaderWorkerSets (prefill and decode) with coordinated lifecycle management, including two-dimensional rolling updates and service orchestration. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

kannon92 · 2026-03-05T18:00:53Z

And please sign the CLA.

keps/766-DisaggDeployment/README.md

keps/766-DisaggDeployment/kep.yaml

Update KEP metadata with actual reviewers and approvers, and remove the feature gate from Alpha requirements since DisaggDeployment doesn't require a feature gate. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

Signed-off-by: Mathis Felardos <mathis@mistral.ai>

Edwinhr716 · 2026-03-05T21:20:20Z

keps/766-DisaggDeployment/README.md

+
+### Non-Goals
+
+1. **Auto-scaling support**: HPA/VPA integration or automatic scaling based on inference load metrics is out of scope.


You can still autoscale at the prefill / decode level right?

Yes, I think it should work. I haven't tested it in my implementation, but I don't see any reason why it couldn't work. I put it as Non-Goal since it's not the focus of the operator for now, and I actually think that scaling Prefill and Decode independently makes more sense than scaling the entire DisaggDeployment directly.

I actually think that scaling Prefill and Decode independently

Yeah that's the one that I was referring to, still be able to scale independently while treating it as a unit

Edwinhr716 · 2026-03-05T21:24:59Z

keps/766-DisaggDeployment/README.md

+
+    // LeaderWorkerTemplate defines the template for leader/worker pods.
+    // +required
+    LeaderWorkerTemplate LeaderWorkerTemplate `json:"leaderWorkerTemplate"`


Is there a way to pass annotations that will be injected to the LWS object? The exclusive-placement feature relies on that to work https://lws.sigs.k8s.io/docs/concepts/#exclusive-lws-to-topology-placement

An idea would be to add a field here called ExclusivePlacement and if set, have the controller inject the annotation. Though would need some sort of config to set subgroup-exclusive-placement as well.

if it doesn't work out of the box in my implementation, we could add it. I'll see if we can add annotations without changing anything tomorrow morning.

Yeah it is a future work type of thing, just thinking out loud. We can help with the implementation as well.

Edwinhr716 · 2026-03-06T16:52:13Z

keps/766-DisaggDeployment/README.md

+**Example 2: Asymmetric Replicas** (9 prefill, 3 decode → 9 prefill, 3 decode, maxSurge=1):
+
+When prefill and decode have different replica counts, the algorithm computes steps based on the larger dimension (9 in this case). The smaller side (decode) progresses proportionally slower:
+


If I'm understanding correctly, this means that during updates we are keeping track of four different LWS instances correct?

Can you expand more on why you opted for this implementation instead of for example using the partition field in LWS similar to how we use the partition field in Sts at the LWS level? Is the only gap #710?

You can add it to the alternatives considered section, and we can discuss later if there are any improvements we can make at the LWS level to simplify this algorithm, or if it is worth it at all.

Okay, I was not aware of the partition field. So just read about it, and maybe I'm wrong here, but I don't think it would work:

Revision-aware traffic routing: DisaggDeployment is designed for disaggregated inference, where a load balancer (e.g., a modified LLM-d Endpoint Picker) must route prefill requests to backends whose decode counterparts are on the same revision. With separate LWS (and service) per revision, each pod's revision is explicit via labels (disaggdeployment.x-k8s.io/revision). The LB can count backends per revision across both pools and distribute traffic proportionally. With partition-based updates, pods within the same LWS have different templates based on ordinal, making revision-aware routing significantly more complex. The goal is to avoid having a prefill talk to an incompatible decode. Both sides must be treated as totally incompatible. This special routing requires work on the LLM-d side.

LWS as a Read-Only resource: Treating LWS as a Read-Only resource (like a ReplicaSet) makes more sense, as you want to update Prefill and Decode at different paces depending on the step you are at. It's a tied update. I don't think you can achieve this with partition.

Ops observability: It's simpler for ops observability. You can see directly at which stage your update is, since you can see the version right away during updates. (see scrennshot)

I added a section about this

Thanks for adding the section in the KEP. A few follow up questions that I have:

With partition-based updates, pods within the same LWS have different templates based on ordinal, making revision-aware routing significantly more complex

I'm assuming that the labels are injected at the LWS level, but can't we achieve the same logic if the disagg revision label is injected to the leader pod instead? Then, those above the partition will have the newer revision label, those below will have the old label, so the router can still do the mapping between old version and newer version. This might depend on the router's implementation, but I don't think it make a difference at what level the label is injected.

as you want to update Prefill and Decode at different paces depending on the step you are at. It's a tied update. I don't think you can achieve this with partition.

Why not? I don't see the difference between scaling prefill & decode vs changing the partition value independently.

Ops observability

This is I think would be the main benefit. But I wonder how much it matters to have this be easily readable. Neither StatefulSet nor LWS have this.

Just to make it clear, this is not a blocker for the merger of the KEP or the code implementation. I just think that it is worth discussing for the future if there's a simpler path here. The algorithm needed for rolling updates is complex already, and would be easier to understand if we stick to existing Kubernetes concepts like partition instead of trying to maintain four LWS replicas.

kannon92 · 2026-03-06T17:08:32Z

keps/766-DisaggDeployment/README.md

+
+4. **Configuration drift**: Without a unified resource, prefill and decode configurations can drift apart, leading to subtle incompatibilities.
+
+### Goals


Has there been any thoughts on how this would work with Kueue?

What I am wondering is if this DisaggDeployment is quota limited do you want to avoid creating the LWS's?

In my understanding Kueue is working thanks to annotations right? I would say that it works out of the box, and if adding annotation doesn't work directly, we could add support:

see:
#767 (comment)

Well Kueue works with annotations for LWS because it has a internal controller that works with it. https://github.com/kubernetes-sigs/kueue/tree/main/pkg/controller/jobs/leaderworkerset

But I guess I'm mostly curious on things like Kueue's TopologyAwareScheduling (ie #665 ) with dissaggreatedserving

Well, I'm not sure 🙂 - I guess we will need to experiment

Yea np. Its not blocking.

Address reviewer feedback explaining why we use multiple LWS per revision instead of the LWS partition field: - Revision-aware traffic routing for LLM-d Endpoint Picker - LWS as read-only resource (like Deployment/ReplicaSet pattern) - Simpler ops observability during rolling updates

Edwinhr716 · 2026-03-07T01:39:31Z

keps/766-DisaggDeployment/README.md

+    // +kubebuilder:default=1
+    Replicas *int32 `json:"replicas,omitempty"`
+
+    // LeaderWorkerTemplate defines the template for leader/worker pods.


Have you tested if setting size = 1 for both the prefill and the decode server works?

You mean replicas==1 for both side? Yes I did, and it works.

ahg-g · 2026-03-08T21:24:16Z

keps/766-DisaggDeployment/README.md

+DisaggDeployment simplifies the deployment of disaggregated LLM inference by:
+- Managing two LeaderWorkerSets (prefill and decode) as a single logical unit
+- Providing coordinated two-dimensional rolling updates across both sides
+- Automatically creating Services when both sides are ready


We have generally avoided bundling/instantiating easily composable resources. For example, in upstream, you don't see the workload apis (like Deployment, StatefulSet etc.) automatically create a Service.

The service is optional, and I thought it was okay to be able to create one since LWS already does this.

But anyway, we can discuss about this. it was easier to handle our scheduler on our side, but I understand your concerns.

LWS only creates a headless service to trigger programming DNS entries for the individual workers, this is useful to standardize discovery among the leader-workers of a group (e.g., typically workers need to know the network address of their leader, for other the frameworks it could be the opposite, the leader needs to know the network addresses of its workers).

ahg-g · 2026-03-08T21:30:41Z

keps/766-DisaggDeployment/README.md

+
+4. **Traffic management / routing**: Integration with service meshes or ingress controllers for traffic splitting during rollouts is out of scope.
+
+5. **Generic group support**: Supporting arbitrary groups beyond prefill/decode (e.g., `decode-long-context`, `encode`) is deferred to a future enhancement.


I think we should at least make the api future proof to support that.

I think it's already future proof since:

apiVersion: disaggregatedset.x-k8s.io/v1alpha1 kind: DisaggregatedSet metadata: name: my-llm spec: prefill: replicas: 3 # Number of LWS groups rolloutStrategy: maxSurge: 1 # Extra replicas during update maxUnavailable: 0 # Don't remove until new is ready serviceTemplate: # Optional: create Service when ready spec: ports: - port: 8080 leaderWorkerTemplate: size: 2 # 1 leader + 1 worker per group workerTemplate: spec: containers: - name: prefill image: vllm:latest resources: limits: nvidia.com/gpu: 1 decode: replicas: 6 # Often more decode than prefill rolloutStrategy: maxSurge: 2 maxUnavailable: 1 serviceTemplate: spec: ports: - port: 8080 leaderWorkerTemplate: size: 1 # Single pod per group workerTemplate: spec: containers: - name: decode image: vllm:latest resources: limits: nvidia.com/gpu: 1 decode-long-context: # <== adding a special group should work out of the box in the config. [ ... ]

What I mean is prefill and decode are currently hardcoded, but they could simply be treated as a standalone "side" with a name (either prefill or decode).

So except if I miss something (which is totally possible), I think it's already future proof?

This is still hardcoding the prefill and decode as part of the api, we could instead have a list with named entries to make the api more generic. But we can follow up on that possibility when we merge it to run with the lws controller.

I guess you are suggesting this solution:

type DisaggregatedSetSpec struct { Sides []DisaggSideConfig `json:"sides"` } type DisaggSideConfig struct { Name string `json:"name"` // "prefill", "decode", etc. // ... rest of fields }

YAML:

spec: sides: - name: prefill replicas: 10 ... - name: decode replicas: 2 ... - name: custom_stage replicas: 5

I need to say that this is an easy change, and I could do that (while only supporting prefill and decode for now).

What I had in mind however was this:

Option 2: Pure Map (Breaking Change) type DisaggregatedSetSpec struct { Sides map[string]*DisaggSideConfig `json:"sides"` }

YAML:

spec: sides: prefill: <lws1> decode: <lws2> custom_stage: <lws3>

This would work with older YAML configs too.

we could instead have a list with named entries to make the api more generic

If we go this route, we should make it clear somewhere that we only support synced rolling updates between the first two items in the list, not all of them.

keps/766-DisaggDeployment/README.md

ahg-g · 2026-03-08T22:08:15Z

keps/766-DisaggDeployment/README.md

+
+#### Story 3: Coordinated Service Discovery
+
+As an application developer, I want Services to be automatically created for my prefill and decode endpoints, but only when both sides are ready to serve traffic, so that my application doesn't route requests to partially-ready infrastructure.


I do not think relying on the existence of the Service resource is the proper approach to address this problem. The P/D coordinator should handle this case, not the workload api.

The service is optional, and I thought it was okay to be able to create one since LWS already does this.

But anyway, we can discuss about this. it was easier to handle our scheduler on our side, but I understand your concerns.

ahg-g · 2026-03-08T22:29:03Z

keps/766-DisaggDeployment/README.md

+   old + new <= target + maxSurge
+   ```
+
+5. **Scale-Up Priority**: New replicas are scaled up before old replicas are scaled down, ensuring capacity is never below the minimum.


Are we not concerned about limited hardware capacity? does this mean the cluster must have hardware capacity for at least one extra replica to conduct and update (vs doing in-place)? Can we do surge =0 and maxUnavailable = 1 to trigger an in-place rollout?

Yes, this works, and it's part of the test suite. Simply we favor maxSurge over maxUnavailable. We could change this behavior if you guys want.

No, I think preferring maxSurge seems like the better semantics.

ahg-g · 2026-03-08T22:39:10Z

keps/766-DisaggDeployment/README.md

+| 5    | 1           | 1          | 3           | 3          |
+| 6    | 0           | 0          | 3           | 3          |
+
+**Example 2: Asymmetric Replicas** (9 prefill, 3 decode → 9 prefill, 3 decode, maxSurge=1):


Can we spill out the maxSurge semantics? is it for both prefill and decode or for each?

It's for each side.

hasB4K · 2026-03-09T12:36:18Z

The implementation has been pushed here 🙂:
#773

I wanted this PR to be on top of that branch, but since my branch is on my fork, I couldn't do it 😬 (it created a PR on my forked repo instead 🫤)

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Mar 5, 2026

k8s-ci-robot added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Mar 5, 2026

k8s-ci-robot requested review from ahg-g and ardaguclu March 5, 2026 17:02

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Mar 5, 2026

k8s-ci-robot added cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Mar 5, 2026

hasB4K mentioned this pull request Mar 5, 2026

Add DisaggDeployment CRD for coordinated prefill/decode workload management #766

Open

3 tasks

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 5, 2026

hasB4K force-pushed the kep-disagg-deployment branch from 98fa70f to a6d7172 Compare March 5, 2026 18:00

Edwinhr716 reviewed Mar 5, 2026

View reviewed changes

keps/766-DisaggDeployment/README.md Outdated Show resolved Hide resolved

Edwinhr716 reviewed Mar 5, 2026

View reviewed changes

keps/766-DisaggDeployment/README.md Outdated Show resolved Hide resolved

Edwinhr716 reviewed Mar 5, 2026

View reviewed changes

keps/766-DisaggDeployment/kep.yaml Outdated Show resolved Hide resolved

KEP-766: Add reviewers/approvers and remove feature gate requirement

82ec06b

Update KEP metadata with actual reviewers and approvers, and remove the feature gate from Alpha requirements since DisaggDeployment doesn't require a feature gate. Signed-off-by: Mathis Felardos <mathis@mistral.ai>

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels Mar 5, 2026

Use disaggdeployment.x-k8s.io label prefix

95941ed

Signed-off-by: Mathis Felardos <mathis@mistral.ai>

hasB4K requested a review from Edwinhr716 March 5, 2026 19:24

Edwinhr716 reviewed Mar 5, 2026

View reviewed changes

Edwinhr716 reviewed Mar 6, 2026

View reviewed changes

kannon92 reviewed Mar 6, 2026

View reviewed changes

hasB4K force-pushed the kep-disagg-deployment branch 2 times, most recently from 4764822 to 5eb8a46 Compare March 6, 2026 18:21

hasB4K force-pushed the kep-disagg-deployment branch from 5eb8a46 to fcc3b9c Compare March 6, 2026 18:27

Edwinhr716 reviewed Mar 7, 2026

View reviewed changes

ahg-g reviewed Mar 8, 2026

View reviewed changes

KEP-766: rename DisaggDeployment to DisaggregatedSet

f9718f4

hasB4K changed the title ~~KEP-766: Add DisaggDeployment proposal~~ KEP-766: Add DisaggregatedSet proposal Mar 9, 2026


		### Non-Goals

		1. Auto-scaling support: HPA/VPA integration or automatic scaling based on inference load metrics is out of scope.

		Example 2: Asymmetric Replicas (9 prefill, 3 decode → 9 prefill, 3 decode, maxSurge=1):

		When prefill and decode have different replica counts, the algorithm computes steps based on the larger dimension (9 in this case). The smaller side (decode) progresses proportionally slower:


		4. Configuration drift: Without a unified resource, prefill and decode configurations can drift apart, leading to subtle incompatibilities.

		### Goals


		4. Traffic management / routing: Integration with service meshes or ingress controllers for traffic splitting during rollouts is out of scope.

		5. Generic group support: Supporting arbitrary groups beyond prefill/decode (e.g., `decode-long-context`, `encode`) is deferred to a future enhancement.


		#### Story 3: Coordinated Service Discovery

		As an application developer, I want Services to be automatically created for my prefill and decode endpoints, but only when both sides are ready to serve traffic, so that my application doesn't route requests to partially-ready infrastructure.

Conversation

hasB4K commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it

Special notes for your reviewer

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Mar 5, 2026

What type of PR is this?

What this PR does / why we need it

Special notes for your reviewer

Does this PR introduce a user-facing change?

Uh oh!

linux-foundation-easycla bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-ci-robot commented Mar 5, 2026

Uh oh!

k8s-ci-robot commented Mar 5, 2026

Uh oh!

netlify bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-lws ready!

Uh oh!

k8s-ci-robot commented Mar 5, 2026

Uh oh!

Edwinhr716 commented Mar 5, 2026

Uh oh!

Edwinhr716 commented Mar 5, 2026

Uh oh!

kannon92 commented Mar 5, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hasB4K Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Edwinhr716 Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hasB4K commented Mar 5, 2026 •

edited

Loading

linux-foundation-easycla bot commented Mar 5, 2026 •

edited

Loading

netlify bot commented Mar 5, 2026 •

edited

Loading

hasB4K Mar 5, 2026 •

edited

Loading

Edwinhr716 Mar 6, 2026 •

edited

Loading