Skip to content

Commit 78f480c

Browse files
committed
Updating proposal statuses (kubernetes-sigs#1472)
1 parent 53866e4 commit 78f480c

File tree

9 files changed

+30
-6
lines changed

9 files changed

+30
-6
lines changed

docs/proposals/002-api-proposal/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,11 @@
22
# Gateway API Inference Extension
33

44
## Proposal Status
5-
***Draft***
5+
***Implemented/Obsolete***
6+
- Refer to [the InferencePool v1 API review](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1173) for the InferencePool modifications
7+
- Refer to [the InferenceModel evolution proposal](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/1199-inferencemodel-api-evolution) for the InferenceModel modifications
8+
- Refer to the `/api/` & `/apix/` directories for the current status
9+
610

711
## Table of Contents
812

docs/proposals/003-model-server-protocol/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
This is the protocol between the EPP and the model servers.
44

5+
## Proposal status
6+
***Partially implemented***
7+
8+
Notes
9+
- With the creation of the [pluggable architecture](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal) this protocol can, by definition, not be as strict
10+
511
### Inference API Protocol
612

713
The model server MUST implement OpenAI’s [Completions](https://platform.openai.com/docs/api-reference/completions)

docs/proposals/004-endpoint-picker-protocol/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Endpoint Picker Protocol
22

3+
## Proposal Status
4+
***Implemented***
5+
6+
# Proposal
7+
38
The Endpoint Picker, or EPP, is a core component of the inference extension. Ultimately it's
49
responsible for picking an endpoint from the `InferencePool`. A reference implementation can be
510
found [here](../../../pkg/epp/).

docs/proposals/006-scheduler/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
Authors: @kfswain, @smarterclayton
44

55
## Proposal Status
6-
***Draft***
6+
***Implemented***
77

88
## Table of Contents
99

docs/proposals/0602-prefix-cache-aware-routing-proposal/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
# Prefix Cache Aware Request Scheduling
22

3+
## Proposal Status
4+
***Implemented***
5+
6+
# Proposal
7+
38
## Overview
49

510
Prefix caching is a well-known technique in LLM inference to save duplicate tensor computation for prompts with the same prefix tokens, and is available in many model servers or model as a service providers. Leveraging prefix caching can significantly boost system performance, especially the time to first token (TTFT). Given that EPP has a global view of requests and model servers in the `InferencePool`, it can schedule requests intelligently to maximize the global prefix cache hit rate.

docs/proposals/0683-epp-architecture-proposal/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Author(s): @kfswain
44
## Proposal Status
5-
***Draft***
5+
***Implemented***
66

77
## Summary
88

docs/proposals/0845-scheduler-architecture-proposal/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Author(s): @kfswain, @ahg-g, @nirrozenbaum
44
## Proposal Status
5-
***Draft***
5+
***Implemented***
66

77
## Summary
88
The Scheduling Subsystem is a framework used to implement scheduling algorithms. High level definition [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/006-scheduler) & EPP Architecture [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).

docs/proposals/1023-data-layer-architecture/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ Author(s): @elevran @nirrozenbaum
44

55
## Proposal Status
66

7-
***Draft***
7+
***Accepted***
88

99
## Summary
1010

docs/proposals/1199-inferencemodel-api-evolution/README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,11 @@
22

33
Author(s): @kfswain, @ahg-g, @lukeavandrie
44
## Proposal Status
5-
***Draft***
5+
***Implemented***
6+
7+
Note
8+
- Phase 1 is complete
9+
- Phase 2 is still WIP
610

711
## Summary
812
Multiple docs have discussed the restructuring of the InferenceModel API. This [doc](https://docs.google.com/document/d/1x6aI9pbTF5oOsaEQYc9n4pBBY3_AuEY2X51VKxmBSnU/edit?tab=t.0#heading=h.towq7jyczzgo) proposes an InferenceSchedulingObjective CRD, and this [doc](https://docs.google.com/document/d/1G-CQ17CM4j1vNE3T6u9uP2q-m6jK14ANPCwTfJ2qLS4/edit?tab=t.0) builds upon the previous document to solidify the requirement for the new iteration of the InferenceModel API to continue to solve the identity problem. Both these documents were useful in continuing to gather feedback & iterate on a proper solution.

0 commit comments

Comments
 (0)