Skip to content

Commit 54822dd

Browse files
authored
version in README (#1072)
Signed-off-by: Nir Rozenbaum <[email protected]>
1 parent 28aa195 commit 54822dd

File tree

1 file changed

+8
-6
lines changed

1 file changed

+8
-6
lines changed

README.md

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -78,30 +78,32 @@ Llm-d customizes vLLM & IGW to create a disaggregated serving solution. We've wo
7878

7979
IGW has enhanced support for vLLM via llm-d, and broad support for any model servers implementing the protocol. More details can be found in [model server integration](https://gateway-api-inference-extension.sigs.k8s.io/implementations/model-servers/).
8080

81-
8281
## Status
8382

84-
This project is [alpha (0.3 release)](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/tag/v0.3.0). It should not be used in production yet.
83+
![Latest Release](https://img.shields.io/github/v/release/kubernetes-sigs/gateway-api-inference-extension?)
84+
85+
This project is in alpha. latest release can be found [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest).
86+
It should not be used in production yet.
8587

8688
## Getting Started
8789

8890
Follow our [Getting Started Guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/) to get the inference-extension up and running on your cluster!
8991

90-
See our website at https://gateway-api-inference-extension.sigs.k8s.io/ for detailed API documentation on leveraging our Kubernetes-native declarative APIs
92+
See [our website](https://gateway-api-inference-extension.sigs.k8s.io/) for detailed API documentation on leveraging our Kubernetes-native declarative APIs
9193

9294
## Roadmap
9395

9496
As Inference Gateway builds towards a GA release. We will continue to expand our capabilities, namely:
95-
1. Prefix-cache aware load balancing with interfaces for remote caches
96-
1. Recommended LoRA adapter pipeline for automated rollout
97+
98+
1. Prefix-cache aware load balancing with interfaces for remote caches
99+
1. Recommended LoRA adapter pipeline for automated rollout
97100
1. Fairness and priority between workloads within the same criticality band
98101
1. HPA support for autoscaling on aggregate metrics derived from the load balancer
99102
1. Support for large multi-modal inputs and outputs
100103
1. Support for other GenAI model types (diffusion and other non-completion protocols)
101104
1. Heterogeneous accelerators - serve workloads on multiple types of accelerator using latency and request cost-aware load balancing
102105
1. Disaggregated serving support with independently scaling pools
103106

104-
105107
## End-to-End Tests
106108

107109
Follow this [README](./test/e2e/epp/README.md) to learn more about running the inference-extension end-to-end test suite on your cluster.

0 commit comments

Comments
 (0)