Skip to content

[RFC]: Support Exclusive Topology in StormService for Pod Colocation #1842

@googs1025

Description

@googs1025

Description

Problem Statement

In distributed inference and high-performance computing scenarios (e.g., disaggregated LLM serving with Prefill/Decode roles), it is critical to ensure that all Pods belonging to the same logical unit (i.e., a single RoleSet) are scheduled on the same topology domain—such as the same node (kubernetes.io/hostname) or availability zone (topology.kubernetes.io/zone). This minimizes network latency and improves data locality.

Currently, StormService lacks native support for enforcing such co-location constraints across multiple roles within a RoleSet.

Proposed Solution

Introduce an optional field exclusiveTopology in RoleSetSpec:

// ExclusiveTopology specifies a Kubernetes topology key (e.g., "kubernetes.io/hostname")
// that all Pods in this RoleSet must share. When set, the StormService controller
// will automatically inject required pod affinity rules to ensure co-location.
// +optional
ExclusiveTopology string `json:"exclusiveTopology,omitempty"`

When exclusiveTopology is specified:

  • The controller adds a requiredDuringSchedulingIgnoredDuringExecution pod affinity rule to every role’s PodTemplate.
  • The label selector targets all Pods in the same RoleSet using stable labels like:
    • storm-service-name
    • roleset-name or a unique RoleSet identifier
  • All roles within the RoleSet are guaranteed to land on nodes sharing the same value for the given topology key.

Example Usage

apiVersion: orchestration.aibrix.ai/v1alpha1
kind: StormService
metadata:
  name: pd-inference
spec:
  replicas: 2
  template:
    spec:
      exclusiveTopology: "kubernetes.io/hostname"  # ← enforce per-RoleSet node co-location
      roles:
        - name: prefill
          replicas: 1
          template: { ... }
        - name: decode
          replicas: 2
          template: { ... }

Result:

  • 2 RoleSets created (due to replicas: 2)
  • Each RoleSet’s 3 Pods (1 prefill + 2 decode) scheduled on one node
  • The two RoleSets placed on different nodes (naturally via scheduling spread)

Benefits

  • Enables low-latency communication between roles (e.g., Prefill ↔ Decode)
  • Improves resource efficiency via data locality

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions