Skip to content

Commit 8e37ee0

Browse files
authored
fix(docs): correct minor typos and formatting in documentation files (#794)
Signed-off-by: Wilson Wu <[email protected]>
1 parent 34bdc84 commit 8e37ee0

File tree

6 files changed

+64
-62
lines changed

6 files changed

+64
-62
lines changed

CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ Before you begin, ensure you have the following installed:
4343

4444
This downloads the pre-trained classification models from Hugging Face.
4545

46-
3. **Install Python dependencies(Optional):**
46+
3. **Install Python dependencies (Optional):**
4747

4848
```bash
4949
# For training and development

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@
3939

4040
#### Auto-Selection of Models and LoRA Adapters
4141

42-
An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models or LoRA adapters from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
42+
A **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models or LoRA adapters from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
4343

4444
![mom-overview](./website/static/img/mom-overview.png)
4545

website/docs/installation/docker-compose.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
sidebar_position: 3
33
---
44

5-
# Install in Docker Compose
5+
# Install with Docker Compose
66

77
This guide provides step-by-step instructions for deploying the vLLM Semantic Router with Envoy AI Gateway on Docker Compose.
88

99
## Common Prerequisites
1010

1111
- **Docker Engine:** see more in [Docker Engine Installation](https://docs.docker.com/engine/install/)
1212

13-
- **Clone repo**
13+
- **Clone repo:**
1414

1515
```bash
1616
git clone https://github.com/vllm-project/semantic-router.git

website/docs/installation/k8s/istio.md

Lines changed: 23 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# Install with Istio Gateway
1+
# Install with Istio Gateway
22

33
This guide provides step-by-step instructions for deploying the vLLM Semantic Router (vsr) with Istio Gateway on Kubernetes. Istio Gateway uses Envoy under the covers so it is possible to use vsr with it. However there are differences between how different Envoy based Gateways process the ExtProc protocol, hence the deployment described here is different from the deployment of vsr alongwith other types of Envoy based Gateways as described in the other guides in this repo. There are multiple architecture options possible to combine Istio Gateway with vsr. This document describes one of the options.
4-
4+
55
## Architecture Overview
66

77
The deployment consists of:
@@ -16,20 +16,20 @@ The deployment consists of:
1616
Before starting, ensure you have the following tools installed:
1717

1818
- [Docker](https://docs.docker.com/get-docker/) - Container runtime
19-
- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes
19+
- [minikube](https://minikube.sigs.k8s.io/docs/start/) - Local Kubernetes
2020
- [kind](https://kind.sigs.k8s.io/docs/user/quick-start/#installation) - Kubernetes in Docker
2121
- [kubectl](https://kubernetes.io/docs/tasks/tools/) - Kubernetes CLI
2222

23-
Either minikube or kind works to deploy a local kubernetes cluster needed for this exercise so you only need one of these two. We use minikube in the description below but the same steps should work with a Kind cluster once the cluster is created in Step 1.
23+
Either minikube or kind works to deploy a local kubernetes cluster needed for this exercise so you only need one of these two. We use minikube in the description below but the same steps should work with a Kind cluster once the cluster is created in Step 1.
2424

2525
We will also deploy two different LLMs in this exercise to illustrate the semantic routing and model routing function more clearly so you ideally you should run this on a machine that has GPU support to run the two models used in this exercise and adequate memory and storage for these models. You can also use equivalent steps on a smaller server that runs smaller LLMs on a CPU based server without GPUs.
2626

2727
## Step 1: Create Minikube Cluster
2828

29-
Create a local Kubernetes cluster via minikube (or equivalently via Kind).
29+
Create a local Kubernetes cluster via minikube (or equivalently via Kind).
3030

3131
```bash
32-
# Create cluster
32+
# Create cluster
3333
$ minikube start \
3434
--driver docker \
3535
--container-runtime docker \
@@ -50,29 +50,29 @@ kubectl create secret generic hf-token-secret --from-literal=token=$HF_TOKEN
5050
```
5151

5252
```bash
53-
# Create vLLM service running llama3-8b
53+
# Create vLLM service running llama3-8b
5454
kubectl apply -f deploy/kubernetes/istio/vLlama3.yaml
5555
```
5656

5757
This may take several (10+) minutes the first time this is run to download the model up until the vLLM pod running this model is in READY state. Similarly also deploy the second LLM (phi4-mini) and wait for several minutes until the pod is in READY state.
5858

5959
```bash
60-
# Create vLLM service running phi4-mini
60+
# Create vLLM service running phi4-mini
6161
kubectl apply -f deploy/kubernetes/istio/vPhi4.yaml
6262
```
6363

6464
At the end of this you should be able to see both your vLLM pods are READY and serving these LLMs using the command below. You should also see Kubernetes services exposing the IP/ port on which these models are being served. In the example below the llama3-8b model is being served via a kubernetes service with service IP of 10.108.250.109 and port 80.
6565

6666
```bash
67-
# Verify that vLLM pods running the two LLMs are READY and serving
67+
# Verify that vLLM pods running the two LLMs are READY and serving
6868

6969
kubectl get pods
7070
NAME READY STATUS RESTARTS AGE
7171
llama-8b-57b95475bd-ph7s4 1/1 Running 0 9d
7272
phi4-mini-887476b56-74twv 1/1 Running 0 9d
7373

7474
# View the IP/port of the Kubernetes services on which these models are being served
75-
75+
7676
kubectl get service
7777
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
7878
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 36d
@@ -104,7 +104,7 @@ kubectl get pods -n istio-system
104104

105105
## Step 4: Update vsr config
106106

107-
The file deploy/kubernetes/istio/config.yaml will get used to configure vsr when it is installed in the next step. Ensure that the models in the config file match the models you are using and that the vllm_endpoints in the file match the ip/ port of the llm kubernetes services you are running. It is usually good to start with basic features of vsr such as prompt classification and model routing before experimenting with other features such as PromptGuard or ToolCalling.
107+
The file deploy/kubernetes/istio/config.yaml will get used to configure vsr when it is installed in the next step. Ensure that the models in the config file match the models you are using and that the vllm_endpoints in the file match the ip/ port of the llm kubernetes services you are running. It is usually good to start with basic features of vsr such as prompt classification and model routing before experimenting with other features such as PromptGuard or ToolCalling.
108108

109109
## Step 5: Deploy vLLM Semantic Router
110110

@@ -130,16 +130,17 @@ kubectl apply -f deploy/kubernetes/istio/destinationrule.yaml
130130
kubectl apply -f deploy/kubernetes/istio/envoyfilter.yaml
131131
```
132132

133-
## Step 7: Install gateway routes
133+
## Step 7: Install gateway routes
134134

135135
Install HTTPRoutes in the Istio gateway.
136136

137137
```bash
138138
kubectl apply -f deploy/kubernetes/istio/httproute-llama3-8b.yaml
139139
kubectl apply -f deploy/kubernetes/istio/httproute-phi4-mini.yaml
140140
```
141-
141+
142142
## Step 8: Testing the Deployment
143+
143144
To expose the IP on which the Istio gateway listens to client requests from outside the cluster, you can choose any standard kubernetes option for external load balancing. We tested our feature by [deploying and configuring metallb](https://metallb.universe.tf/installation/) into the cluster to be the LoadBalancer provider. Please refer to metallb documentation for installation procedures if needed. Finally, for the minikube case, we get the external url as shown below.
144145

145146
```bash
@@ -156,7 +157,7 @@ Try the following cases with and without model "auto" selection to confirm that
156157
Example queries to try include the following
157158

158159
```bash
159-
# Model name llama3-8b provided explicitly, should route to this backend
160+
# Model name llama3-8b provided explicitly, should route to this backend
160161
curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: application/json" -d '{
161162
"model": "llama3-8b",
162163
"messages": [
@@ -168,7 +169,7 @@ curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: applicati
168169
```
169170

170171
```bash
171-
# Model name set to "auto", should categorize to "computer science" & route to llama3-8b
172+
# Model name set to "auto", should categorize to "computer science" & route to llama3-8b
172173
curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: application/json" -d '{
173174
"model": "auto",
174175
"messages": [
@@ -180,7 +181,7 @@ curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: applicati
180181
```
181182

182183
```bash
183-
# Model name phi4-mini provided explicitly, should route to this backend
184+
# Model name phi4-mini provided explicitly, should route to this backend
184185
curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: application/json" -d '{
185186
"model": "phi4-mini",
186187
"messages": [
@@ -192,7 +193,7 @@ curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: applicati
192193
```
193194

194195
```bash
195-
# Model name set to "auto", should categorize to "math" & route to phi4-mini
196+
# Model name set to "auto", should categorize to "math" & route to phi4-mini
196197
curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: application/json" -d '{
197198
"model": "auto",
198199
"messages": [
@@ -211,7 +212,7 @@ curl http://192.168.49.2:30913/v1/chat/completions -H "Content-Type: applicati
211212

212213
```bash
213214
# Check istio gateway status
214-
kubectl get gateway
215+
kubectl get gateway
215216

216217
# Check istio gw service status
217218
kubectl get svc inference-gateway-istio
@@ -226,7 +227,7 @@ kubectl logs deploy/inference-gateway-istio -c istio-proxy
226227
# Check semantic router pod
227228
kubectl get pods -n vllm-semantic-router-system
228229

229-
# Check semantic router service
230+
# Check semantic router service
230231
kubectl get svc -n vllm-semantic-router-system
231232

232233
# Check semantic router logs
@@ -240,17 +241,17 @@ kubectl logs -n vllm-semantic-router-system deployment/semantic-router
240241
# Remove semantic router
241242
kubectl delete -k deploy/kubernetes/istio/
242243

243-
# Remove Istio
244+
# Remove Istio
244245
istioctl uninstall --purge
245246

246247
# Remove LLMs
247248
kubectl delete -f deploy/kubernetes/istio/vLlama3.yaml
248249
kubectl delete -f deploy/kubernetes/istio/vPhi4.yaml
249250

250-
# Stop minikube cluster
251+
# Stop minikube cluster
251252
minikube stop
252253

253-
# Delete minikube cluster
254+
# Delete minikube cluster
254255
minikube delete
255256
```
256257

0 commit comments

Comments
 (0)