Skip to content

Commit 2d41e82

Browse files
committed
Removes Routing Mode Config and Resolved Open Questions
Signed-off-by: Daneyon Hansen <[email protected]>
1 parent 27c6c38 commit 2d41e82

File tree

1 file changed

+71
-100
lines changed
  • docs/proposals/1374-multi-cluster-inference

1 file changed

+71
-100
lines changed

docs/proposals/1374-multi-cluster-inference/README.md

Lines changed: 71 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -8,17 +8,19 @@ Author(s): @danehans, @bexxmodd, @robscott
88

99
## Summary
1010

11-
An Inference Gateway (IG) provides efficient routing to LLM workloads in Kubernetes by sending requests to an Endpoint Picker (EPP) associated with an
12-
[InferencePool](https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/) and routing the request to a backend model server based on
13-
the EPP-provided endpoint. This proposal extends the current model to support multi-cluster routing so capacity in one cluster can serve traffic originating in another.
11+
An Inference Gateway (IG) provides efficient routing to LLM workloads in Kubernetes by sending requests to an Endpoint Picker (EPP) associated with
12+
an [InferencePool](https://gateway-api-inference-extension.sigs.k8s.io/api-types/inferencepool/) and routing the request to a backend model server
13+
based on the EPP-provided endpoint. This proposal extends the current model to support multi-cluster routing so capacity in one cluster can serve
14+
traffic originating in another.
1415

1516
### Why Multi-Cluster?
1617

1718
GPU capacity is scarce and fragmented. Many users operate multiple clusters across regions and providers. A single cluster rarely satisfies peak or
1819
sustained demand, so a prescribed approach is required to share GPU capacity across clusters by:
1920

2021
- Exporting an InferencePool from a source (“exporting”) cluster.
21-
- Importing the exported InferencePool into one or more destination (“importing”) clusters with enough detail for IGs to route requests to the associated remote model server Pods.
22+
- Importing the exported InferencePool into one or more destination (“importing”) clusters with enough detail for IGs to route requests to the associated
23+
remote model server Pods.
2224

2325
### Goals
2426

@@ -39,37 +41,44 @@ The Multi-Cluster Inference (MCI) model will largely follow the Multi-Cluster Se
3941
- A separate export resource will be avoided, e.g. ServiceExport, by inlining the concept within InferencePool.
4042

4143
An InferencePoolImport resource is introduced that is meant to be fully managed by an MCI controller. This resource provides the information
42-
required for IGs to route LLM requests to model server endpoints of an InferencePool in remote clusters. How the IG routes the request to the remote cluster
43-
is implementation-specific.
44+
required for IGs to route LLM requests to model server endpoints of an InferencePool in remote clusters. How the IG routes the request to the remote
45+
cluster is implementation-specific.
46+
47+
### Routing Modes
48+
49+
The proposal supports the following routing modes:
50+
51+
- Endpoint Mode: An IG of an importing cluster routes to endpoints selected by the EPP of the exported InferencePool. Pod and Service network connectivity
52+
MUST exist between cluster members.
53+
- Parent Mode: An IG of an importing cluster routes to parents, e.g. Gateways, of the exported InferencePool. Parent connectivity MUST exist between cluster
54+
members.
4455

4556
### Workflow
4657

4758
1. **Export an InferencePool:** An [Inference Platform Owner](https://gateway-api-inference-extension.sigs.k8s.io/concepts/roles-and-personas/)
4859
exports an InferencePool by annotating it.
4960
2. **An MCI Controller (Per [ClusterSet](https://multicluster.sigs.k8s.io/api-types/cluster-set/)):**
50-
- Watches all ClusterSet member clusters for exported InferencePool resources.
61+
- Watches all ClusterSet member clusters for exported InferencePool resources (must have access to the K8s API server).
5162
- CRUDs an InferencePoolImport in each member cluster if:
5263
- The cluster contains the namespace of the exported InferencePool (namespace sameness).
5364
- When an InferencePoolImport with the same ns/name already exists in the cluster, update the associated `inferencepoolimport.status.clusters[]` entry.
5465
- Populates InferencePoolImport status with the information required for importing IGs to route requests to exported InferencePool endpoints,
55-
e.g. address of the referenced EPP or remote parent
56-
so the importing IG can either:
57-
- Connect directly to the remote EPP and then route to the selected endpoint (`EndpointMode`).
58-
- Route via a remote parent (`ParentMode`).
66+
e.g. address of the referenced EPP or remote parent so the importing IG can route to remote endpoints using either routing mode.
5967
3. **IG Management Plane (Per Cluster):**
6068
- Watches InferencePoolImport resources.
6169
- Programs the managed IG data plane to either:
6270
- Connect to the exported EPP via gRPC ext-proc for endpoint selection and optionally EPP health/metrics endpoints. Connect directly to exported
63-
InferencePool endpoints. This mode requires inter-cluster `podCIDR` connectivity (`EndpointMode`).
64-
- Connect to remote parents, e.g. IGs, of the exported InferencePool. This mode requires inter-cluster parent connectivity (`ParentMode`).
71+
InferencePool endpoints (Endpoint Mode).
72+
- Connect to remote parents, e.g. IGs, of the exported InferencePool (Parent Mode).
6573
4. **Data Path:**
6674
The data path is dependant on the export mode selected by the user.
67-
- `EndpointMode`: Client → local IG → (make scheduling decision) → local/remote EPP → selected model server endpoint → response.
68-
- `ParentMode`: Client → local IG → (make scheduling decision) → local EPP/remote parent → remote EPP → selected model server endpoint → response.
75+
- Endpoint Mode: Client → local IG → (make scheduling decision) → local/remote EPP → selected model server endpoint → response.
76+
- Parent Mode: Client → local IG → (make scheduling decision) → local EPP/remote parent → remote EPP → selected model server endpoint → response.
6977

7078
### InferencePoolImport Naming
7179

72-
The MCI controller will create an InferencePoolImport resource using the exported InferencePool namespace and name. A cluster name entry in `inferencepoolimport.statu.clusters[]` is added for each cluster that exports an InferencePool with the same ns/name.
80+
The MCI controller will create an InferencePoolImport resource using the exported InferencePool namespace and name. A cluster name entry in
81+
`inferencepoolimport.statu.clusters[]` is added for each cluster that exports an InferencePool with the same ns/name.
7382

7483
### InferencePool Selection
7584

@@ -81,9 +90,9 @@ InferencePool selection is implementation-specific. The following are examples:
8190

8291
### API Changes
8392

84-
#### Export Annotations
93+
#### Export Annotation
8594

86-
The following annotations are being proposed to indicate the desire to export the InferencePool to clusters of a ClusterSet.
95+
The following annotation is being proposed to indicate the desire to export the InferencePool to member clusters of a ClusterSet.
8796

8897
The `inference.networking.x-k8s.io/export` annotation key indicates a desire to export the InferencePool:
8998

@@ -95,25 +104,15 @@ Supported Values:
95104
96105
- `ClusterSet` – export to all members of the current ClusterSet.
97106

98-
The `inference.networking.x-k8s.io/export-mode` annotation key indicates the routing mode that importing IGs should use for the exported InferencePool:
99-
100-
```yaml
101-
inference.networking.x-k8s.io/export-mode: "<value>"
102-
```
103-
104-
The `inference.networking.x-k8s.io/export-mode` annotation requires the `inference.networking.x-k8s.io/export` annotation to be set.
105-
106-
Supported Values:
107-
108-
- `EndpointMode` – Export InferencePool information required for importing clusters to connect to EPP endpoints.
109-
- `ParentMode` – Export InferencePool information required for importing clusters to connect to EPP endpoints through parent(s) of the InferencePool.
110-
111-
**Note:** Additional annotations, e.g. region/domain scoping, filter clusters in the ClusterSet, etc. and potentially adding an InferencePoolExport resource may be considered in the future.
107+
**Note:** Additional annotations, e.g. region/domain scoping, filter clusters in the ClusterSet, routing mode configuration, etc. and
108+
potentially adding an InferencePoolExport resource may be considered in the future.
112109

113110
#### InferencePoolImport
114111

115-
A cluster-local, controller-managed resource that represents an imported InferencePool. It primarily communicates the EPP or parents of the exported InferencePool(s) to the importing IG controller. It is not user-authored; status carries the effective import.
116-
Inference Platform Owners can reference the InferencePoolImport, even if the local cluster does not have an InferencePool. In the context of Gateway API, it means that an HTTPRoute can be configured to reference an InferencePoolImport to route matching requests to remote InferencePool endpoints.
112+
A cluster-local, controller-managed resource that represents an imported InferencePool. It primarily communicates the EPP or parents of the
113+
exported InferencePool(s) to the importing IG controller. It is not user-authored; status carries the effective import. Inference Platform
114+
Owners can reference the InferencePoolImport, even if the local cluster does not have an InferencePool. In the context of Gateway API, it
115+
means that an HTTPRoute can be configured to reference an InferencePoolImport to route matching requests to remote InferencePool endpoints.
117116
This API will be used almost exclusively for tracking endpoints, but unlike MCS, we actually have two distinct sets of endpoints to track:
118117

119118
1. Endpoint Picker Extensions (EPPs)
@@ -122,7 +121,7 @@ This API will be used almost exclusively for tracking endpoints, but unlike MCS,
122121
Key ideas:
123122

124123
- Name/namespace sameness with the exported InferencePool (avoids extra indirection).
125-
- Routing mode: whether the IG should connect to remote Endpoints or Parents.
124+
- Routing mode: whether the IG should route directly to remote EPP-selected endpoints or through parents of the exported InferencePool.
126125
- EPP details: network coordinates and optional health/metrics hints.
127126
- Conditions: Accepted, Ready, etc.
128127

@@ -142,18 +141,18 @@ To reduce controller fan-out, InferencePool status should be updated to surface:
142141

143142
**MCI Controller (Per ClusterSet):**
144143

145-
- Discover exported pools.
146-
- For each target cluster, CRUD InferencePoolImport (mirrored namespace/name).
147-
- Populate `status.clusters[]` entries with:
148-
- EPP service endpoints/ports (and optional health/metrics),
149-
- Optional remote parents (Gateway Services) if `RoutingModeMode=ParentMode`.
144+
- Discover exported InferencePools.
145+
- For each ClusterSet member cluster, CRUD InferencePoolImport (mirrored namespace/name).
146+
- Populate `inferencepoolimport.status.clusters[]` entries with:
147+
- EPP service address, ports, etc to support Endpoint Mode.
148+
- Optional remote parents, e.g. Gateways, to support Parent Mode.
150149

151150
**IG Controller (Per Cluster):**
152151

153152
- Watch InferencePoolImports.
154-
- Program dataplane to either:
155-
- Connect to to remote EPPs and exported InferencePool endpoints (`EndpointMode`).
156-
- Connect to exported parent(s) of the exported InferencePool (`ParentMode`).
153+
- Program the IG data plane to either:
154+
- Connect to to remote EPPs and exported InferencePool endpoints (Endpoint Mode).
155+
- Connect to exported parent(s) of the exported InferencePool (Parent Mode).
157156
- Load-balance matching requests.
158157

159158
## Examples
@@ -210,8 +209,8 @@ flowchart LR
210209

211210
### Exporting Cluster (Cluster A) Manifests
212211

213-
In this example, Cluster A exports the InferencePool to all clusters in the Cluster set using `EndpointMode`. This will
214-
cause the MCI controller to create an InferencePoolImport resource in all clusters except the exporting cluster.
212+
In this example, Cluster A exports the InferencePool to all clusters in the Cluster set using Endpoint Mode. This will
213+
cause the MCI controller to create an InferencePoolImport resource in all clusters.
215214

216215
```yaml
217216
# Export the pool by annotation
@@ -222,7 +221,6 @@ metadata:
222221
namespace: example
223222
annotations:
224223
inference.networking.x-k8s.io/export: "ClusterSet"
225-
inference.networking.x-k8s.io/export-mode: "EndpointMode" # or "ParentMode"
226224
spec:
227225
endpointPickerRef:
228226
name: epp
@@ -233,7 +231,7 @@ spec:
233231
targetPorts:
234232
- number: 8080
235233
---
236-
# EPP exposed via LoadBalancer for simplicity; ClusterIP also works with podCIDR reachability or via parents
234+
# EPP exposed via LoadBalancer
237235
apiVersion: v1
238236
kind: Service
239237
metadata:
@@ -268,29 +266,34 @@ The InferencePoolImport is controller-managed; shown here only to illustrate the
268266
apiVersion: inference.networking.x-k8s.io/v1alpha1
269267
kind: InferencePoolImport
270268
metadata:
271-
name: llm-pool # mirrors exporting pool name
272-
namespace: example # mirrors exporting pool namespace
269+
name: llm-pool # mirrors exporting InferencePool name
270+
namespace: example # mirrors exporting InferencePool namespace
273271
status:
274272
clusters:
275273
- name: cluster-a
276-
routingMode: EndpointMode # or ParentMode
277274
targetPortNumber: 8080
278275
endpointPicker:
279276
name: epp
280277
service:
281-
type: LoadBalancer
282278
addresses:
283279
- 1.2.3.4 # EPP service address (IP or hostname)
284280
port: 9002 # EPP ext-proc port
285281
health:
286282
port: 9003
287283
metrics:
288284
port: 9090
285+
parents:
286+
- name: parent1
287+
namespace: foo
288+
addresses:
289+
- 5.6.7.8 # Remote parent address (IP or hostname)
290+
port: 8080
289291
conditions:
290292
- type: Accepted
291293
status: "True"
292294
- type: Ready
293295
status: "True"
296+
observedGeneration: 1
294297
---
295298
# Route in the importing cluster that targets the imported pool
296299
apiVersion: gateway.networking.k8s.io/v1beta1
@@ -360,16 +363,6 @@ type InferencePoolImport struct {
360363

361364
type InferencePoolImportSpec struct{}
362365

363-
// RoutingMode expresses how the importing IG should route requests to the exported pool.
364-
type RoutingMode string
365-
366-
const (
367-
// EndpointMode means the IG should route to remote model server endpoints directly (podCIDR/routed).
368-
RoutingModeEndpoint RoutingMode = "EndpointMode"
369-
// ParentMode means the IG should route via a remote "parent" (e.g., Gateway Service) that fronts the pool.
370-
RoutingModeParent RoutingMode = "ParentMode"
371-
)
372-
373366
type InferencePoolImportStatus struct {
374367
// Clusters is the set of exporting clusters that currently back this import.
375368
//
@@ -395,13 +388,6 @@ type ImportedCluster struct {
395388
// +kubebuilder:validation:Required
396389
Name string `json:"name"`
397390

398-
// RoutingMode provides a hint to InferenceGateways how to program data plane
399-
// for this import.
400-
//
401-
// +kubebuilder:validation:Enum=EndpointMode;ParentMode
402-
// +kubebuilder:validation:Required
403-
RoutingMode RoutingMode `json:"routingMode"`
404-
405391
// TargetPortNumber is the port the model servers listen on in the exported pool.
406392
// Used when RoutingMode=EndpointMode.
407393
//
@@ -449,19 +435,26 @@ type EndpointPickerImport struct {
449435
// ParentImport models a remote "parent" (typically a Gateway Service) that fronts
450436
// the imported InferencePool.
451437
type ParentImport struct {
452-
Name string `json:"name"`
438+
// Name of the remote parent (informational).
439+
//
440+
// +kubebuilder:validation:Required
441+
Name string `json:"name"`
442+
// Namespace of the remote parent (informational).
443+
//
444+
// +kubebuilder:validation:Required
453445
Namespace string `json:"namespace"`
454-
Service []ServiceImport `json:"service"`
455-
Conditions []metav1.Condition `json:"conditions,omitempty"`
456-
}
446+
// Addresses supports IPs and/or hostnames of the remote parent.
447+
//
448+
// +kubebuilder:validation:Required
449+
Addresses []string `json:"addresses"`
457450

458-
type ServiceImport struct {
459-
// Type mirrors core Service types for clarity.
460-
// +kubebuilder:validation:Enum=ClusterIP;NodePort;LoadBalancer;ExternalName
451+
// Port is the network port exposed by the remote parent.
461452
//
462453
// +kubebuilder:validation:Required
463-
Type corev1.ServiceType `json:"type"`
454+
Port Port `json:"port"`
455+
}
464456

457+
type ServiceImport struct {
465458
// Addresses supports IPs and/or hostnames of the remote service.
466459
//
467460
// +kubebuilder:validation:Required
@@ -475,14 +468,10 @@ type ServiceImport struct {
475468

476469
type HealthEndpoint struct {
477470
Port int32 `json:"port"`
478-
// Optional: reference to a secret containing credentials for secure health checks.
479-
SecretRef *corev1.SecretReference `json:"secretRef,omitempty"`
480471
}
481472

482473
type MetricsEndpoint struct {
483474
Port int32 `json:"port"`
484-
// Optional: reference to a secret containing scrape credentials.
485-
SecretRef *corev1.SecretReference `json:"secretRef,omitempty"`
486475
}
487476

488477
// +kubebuilder:object:root=true
@@ -555,41 +544,23 @@ status:
555544
556545
### Open Questions
557546
558-
#### Export Semantics
559-
560-
- Finalize annotation key/values and ClusterSet discovery. Should we introduce an annotation to filter exporting to specific clusters in the ClusterSet?
561-
- Should we allow per-cluster export configuration (weights, region, SLO tags) at export time?
562-
563-
#### InferencePool status surface
547+
#### InferencePool Status
564548
565-
- Do we extend InferencePool status to publish EPP Service type/addresses/ports, health, metrics, and optional secret refs to simplify imports?
566549
- Should EPP Deployment/Pod discovery be standardized (labels/port names) for health/metrics auto-discovery?
567550
568551
#### Security
569552
570553
- Provide a standard way to bootstrap mTLS between importing IG and exported EPP/parents, e.g. use BackendTLSPolicy?
571-
- Should the MCI controller mirror secrets into the importing cluster (and how to scope/rotate them)?
572-
- Do we need a ReferenceGrant-like mechanism for cross-namespace secrets referenced by InferencePoolImport?
554+
- Should the MCI controller mirror secrets into the importing cluster, e.g. secure metric scraping (Endpoint Mode)?
573555
574556
#### Scheduling and Policy
575557
576-
- Do we define a common `RoutingMode` enum (`EndpointMode` vs `ParentMode`) as above? Is a mixed strategy allowed per InferencePoolImport?
577558
- Should we define a standard cluster preference knob (e.g., PreferLocal, Any, region-affinity, weights) on InferencePoolImport status or IG-local policy CRD?
578559
579-
#### Topology and Reachability
580-
581-
- For `ParentMode`, do we require “HTTPRoute sameness” or any guarantees between exporting/importing clusters?
582-
583560
#### EPP Scale
584561
585562
- If the EPP has multiple replicas, should the MCI controller publish per-replica addresses, e.g. service subsetting, for health/metrics scraping?
586563
587-
#### Observability
588-
589-
- Refine InferencePool status conditions (e.g., EPPReady, ParentsReady, ResolvedRefs)?
590-
- Should we reconsider using an export resource instead of an InferencePool annotation for UX purposes- specifically surfacing status conditions such as
591-
not being able to export an InferencePool b/c the namespace of the exported InferencePool does not exist in importing clusters.
592-
593564
#### Ownership and Lifecycle
594565
595566
- Should the MCI controller be owned by the Gateway API Inference Extension project?

0 commit comments

Comments
 (0)