kubernetes-sigs · k8s-ci-robot · Aug 26, 2025 · Aug 26, 2025
diff --git a/docs/proposals/002-api-proposal/README.md b/docs/proposals/002-api-proposal/README.md
@@ -2,7 +2,11 @@
 # Gateway API Inference Extension
 
 ## Proposal Status
- ***Draft***
+ ***Implemented/Obsolete*** 
+ - Refer to [the InferencePool v1 API review](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/1173) for the InferencePool modifications
+ - Refer to [the InferenceModel evolution proposal](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/1199-inferencemodel-api-evolution) for the InferenceModel modifications
+ - Refer to the `/api/` & `/apix/` directories for the current status
+
 
 ## Table of Contents
 

diff --git a/docs/proposals/003-model-server-protocol/README.md b/docs/proposals/003-model-server-protocol/README.md
@@ -2,6 +2,12 @@
 
 This is the protocol between the EPP and the model servers.
 
+## Proposal status
+***Partially implemented***
+
+Notes
+- With the creation of the [pluggable architecture](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal) this protocol can, by definition, not be as strict
+
 ### Inference API Protocol
 
 The model server MUST implement OpenAI’s [Completions](https://platform.openai.com/docs/api-reference/completions)

diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md
@@ -1,5 +1,10 @@
 # Endpoint Picker Protocol
 
+## Proposal Status
+***Implemented***
+
+# Proposal
+
 The Endpoint Picker, or EPP, is a core component of the inference extension. Ultimately it's
 responsible for picking an endpoint from the `InferencePool`. A reference implementation can be
 found [here](../../../pkg/epp/).

diff --git a/docs/proposals/006-scheduler/README.md b/docs/proposals/006-scheduler/README.md
@@ -3,7 +3,7 @@
 Authors: @kfswain, @smarterclayton
 
 ## Proposal Status
- ***Draft***
+ ***Implemented***
 
 ## Table of Contents
 

diff --git a/docs/proposals/0602-prefix-cache-aware-routing-proposal/README.md b/docs/proposals/0602-prefix-cache-aware-routing-proposal/README.md
@@ -1,5 +1,10 @@
 # Prefix Cache Aware Request Scheduling
 
+## Proposal Status
+***Implemented***
+
+# Proposal
+
 ## Overview
 
 Prefix caching is a well-known technique in LLM inference to save duplicate tensor computation for prompts with the same prefix tokens, and is available in many model servers or model as a service providers. Leveraging prefix caching can significantly boost system performance, especially the time to first token (TTFT). Given that EPP has a global view of requests and model servers in the `InferencePool`, it can schedule requests intelligently to maximize the global prefix cache hit rate.

diff --git a/docs/proposals/0683-epp-architecture-proposal/README.md b/docs/proposals/0683-epp-architecture-proposal/README.md
@@ -2,7 +2,7 @@
 
 Author(s): @kfswain
 ## Proposal Status
- ***Draft***
+ ***Implemented***
 
 ## Summary
 

diff --git a/docs/proposals/0845-scheduler-architecture-proposal/README.md b/docs/proposals/0845-scheduler-architecture-proposal/README.md
@@ -2,7 +2,7 @@
 
 Author(s): @kfswain, @ahg-g, @nirrozenbaum
 ## Proposal Status
- ***Draft***
+ ***Implemented***
 
 ## Summary
 The Scheduling Subsystem is a framework used to implement scheduling algorithms. High level definition [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/006-scheduler) & EPP Architecture [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).

diff --git a/docs/proposals/1023-data-layer-architecture/README.md b/docs/proposals/1023-data-layer-architecture/README.md
@@ -4,7 +4,7 @@ Author(s): @elevran @nirrozenbaum
 
 ## Proposal Status
 
-***Draft***
+***Accepted***
 
 ## Summary
 

diff --git a/docs/proposals/1199-inferencemodel-api-evolution/README.md b/docs/proposals/1199-inferencemodel-api-evolution/README.md
@@ -2,7 +2,11 @@
 
 Author(s): @kfswain, @ahg-g, @lukeavandrie
 ## Proposal Status
- ***Draft***
+ ***Implemented***
+
+ Note
+ - Phase 1 is complete
+ - Phase 2 is still WIP
 
 ## Summary
 Multiple docs have discussed the restructuring of the InferenceModel API. This [doc](https://docs.google.com/document/d/1x6aI9pbTF5oOsaEQYc9n4pBBY3_AuEY2X51VKxmBSnU/edit?tab=t.0#heading=h.towq7jyczzgo) proposes an InferenceSchedulingObjective CRD, and this [doc](https://docs.google.com/document/d/1G-CQ17CM4j1vNE3T6u9uP2q-m6jK14ANPCwTfJ2qLS4/edit?tab=t.0) builds upon the previous document to solidify the requirement for the new iteration of the InferenceModel API to continue to solve the identity problem. Both these documents were useful in continuing to gather feedback & iterate on a proper solution.