You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site-src/index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ The following specific terms to this project:
20
20
inference workloads.
21
21
-**Inference Scheduler**: An extendable component that makes decisions about which endpoint is optimal (best cost /
22
22
best performance) for an inference request based on `Metrics and Capabilities`
23
-
from [Model Serving](/docs/proposals/003-model-server-protocol/README.md).
23
+
from [Model Serving](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/003-model-server-protocol/README.md).
24
24
-**Metrics and Capabilities**: Data provided by model serving platforms about
25
25
performance, availability and capabilities to optimize routing. Includes
26
26
things like [Prefix Cache] status or [LoRA Adapters] availability.
@@ -33,8 +33,8 @@ Gateway API Inference Extension optimizes self-hosting Generative AI Models on K
33
33
It provides optimized load-balancing for self-hosted Generative AI Models on Kubernetes.
34
34
The project’s goal is to improve and standardize routing to inference workloads across the ecosystem.
35
35
36
-
This is achieved by leveraging Envoy's [External Processing](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) to extend any gateway that supports both ext-proc and [Gateway API](https://github.com/kubernetes-sigs/gateway-api) into an [inference gateway](../index.md#concepts-and-definitions).
37
-
This extension extends popular gateways like Envoy Gateway, kgateway, and GKE Gateway - to become [Inference Gateway](../index.md#concepts-and-definitions) -
36
+
This is achieved by leveraging Envoy's [External Processing](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) to extend any gateway that supports both ext-proc and [Gateway API](https://github.com/kubernetes-sigs/gateway-api) into an [inference gateway](#concepts-and-definitions).
37
+
This extension extends popular gateways like Envoy Gateway, kgateway, and GKE Gateway - to become [Inference Gateway](#concepts-and-definitions) -
38
38
supporting inference platform teams self-hosting Generative Models (with a current focus on large language models) on Kubernetes.
39
39
This integration makes it easy to expose and control access to your local [OpenAI-compatible chat completion endpoints](https://platform.openai.com/docs/api-reference/chat)
40
40
to other workloads on or off cluster, or to integrate your self-hosted models alongside model-as-a-service providers
0 commit comments