Skip to content

Commit 4407fd8

Browse files
committed
project: add docs and clean-up
Signed-off-by: bitliu <[email protected]>
1 parent 47b83d2 commit 4407fd8

File tree

58 files changed

+758
-5397
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

58 files changed

+758
-5397
lines changed

.gitignore

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,4 +98,7 @@ scripts/prd.txt
9898
.env.taskmaster
9999
package-lock.json
100100

101-
website/build
101+
website/build
102+
.docusaurus
103+
spec/
104+
results/

CONTRIBUTING.md

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,9 @@ Before you begin, ensure you have the following installed:
2121

2222
- **Rust** (latest stable version)
2323
- **Go** 1.24.1 or later
24-
- **Python** 3.8+ (for training and testing)
25-
- **Envoy Proxy**
2624
- **Hugging Face CLI** (`pip install huggingface_hub`)
2725
- **Make** (for build automation)
26+
- **Python** 3.8+ (Optiona: for training and testing)
2827

2928
### Initial Setup
3029

@@ -40,7 +39,7 @@ Before you begin, ensure you have the following installed:
4039
```
4140
This downloads the pre-trained classification models from Hugging Face.
4241

43-
3. **Install Python dependencies:**
42+
3. **Install Python dependencies(Optional):**
4443
```bash
4544
# For training and development
4645
pip install -r requirements.txt
@@ -245,7 +244,7 @@ The test suite includes:
245244

246245
## Getting Help
247246

248-
- Check the [documentation](https://llm-semantic-router.readthedocs.io/en/latest/)
247+
- Check the [documentation](https://vllm-semantic-router.com/)
249248
- Review existing issues and pull requests
250249
- Ask questions in discussions or create a new issue
251250

Makefile

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
all: build
55

66
# vLLM env var
7-
VLLM_ENDPOINT ?= http://192.168.12.90:11434
7+
VLLM_ENDPOINT ?=
88

99
# Build the Rust library and Golang binding
1010
build: rust build-router
@@ -80,19 +80,19 @@ clean:
8080
rm -f bin/router
8181

8282
# Test the Envoy extproc
83-
test-prompt:
83+
test-auto-prompt-reasoning:
8484
@echo "Testing Envoy extproc with curl (Math)..."
8585
curl -X POST http://localhost:8801/v1/chat/completions \
8686
-H "Content-Type: application/json" \
87-
-d '{"model": "auto", "messages": [{"role": "assistant", "content": "You are a professional math teacher. Explain math concepts clearly and show step-by-step solutions to problems."}, {"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}], "temperature": 0.7}'
88-
@echo "Testing Envoy extproc with curl (Creative Writing)..."
89-
curl -X POST http://localhost:8801/v1/chat/completions \
90-
-H "Content-Type: application/json" \
91-
-d '{"model": "auto", "messages": [{"role": "assistant", "content": "You are a story writer. Create interesting stories with good characters and settings."}, {"role": "user", "content": "Write a short story about a space cat."}], "temperature": 0.7}'
92-
@echo "Testing Envoy extproc with curl (Default/General)..."
87+
-d '{"model": "auto", "messages": [{"role": "system", "content": "You are a professional math teacher. Explain math concepts clearly and show step-by-step solutions to problems."}, {"role": "user", "content": "What is the derivative of f(x) = x^3 + 2x^2 - 5x + 7?"}]}'
88+
89+
# Test the Envoy extproc
90+
test-auto-prompt-no-reasoning:
91+
@echo "Testing Envoy extproc with curl (Math)..."
9392
curl -X POST http://localhost:8801/v1/chat/completions \
9493
-H "Content-Type: application/json" \
95-
-d '{"model": "auto", "messages": [{"role": "assistant", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}], "temperature": 0.7}'
94+
-d '{"model": "auto", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who are you?"}]}'
95+
9696
# Test prompts that contain PII
9797
test-pii:
9898
@echo "Testing Envoy extproc with curl (Credit card number)..."

config/config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,22 +28,22 @@ gpu_config:
2828
# vLLM Endpoints Configuration - supports multiple endpoints, each can serve multiple models
2929
vllm_endpoints:
3030
- name: "endpoint1"
31-
address: "192.168.12.90"
31+
address: "127.0.0.1"
3232
port: 11434
3333
models:
3434
- "phi4"
3535
- "gemma3:27b"
3636
weight: 1 # Load balancing weight
3737
health_check_path: "/health" # Optional health check endpoint
3838
- name: "endpoint2"
39-
address: "192.168.12.91"
39+
address: "127.0.0.1"
4040
port: 11434
4141
models:
4242
- "mistral-small3.1"
4343
weight: 1
4444
health_check_path: "/health"
4545
- name: "endpoint3"
46-
address: "192.168.12.92"
46+
address: "127.0.0.1"
4747
port: 11434
4848
models:
4949
- "phi4" # Same model can be served by multiple endpoints for redundancy

config/envoy-docker.yaml

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
static_resources:
2+
listeners:
3+
- name: listener_0
4+
address:
5+
socket_address:
6+
address: 0.0.0.0
7+
port_value: 8801
8+
filter_chains:
9+
- filters:
10+
- name: envoy.filters.network.http_connection_manager
11+
typed_config:
12+
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
13+
stat_prefix: ingress_http
14+
access_log:
15+
- name: envoy.access_loggers.stdout
16+
typed_config:
17+
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
18+
log_format:
19+
json_format:
20+
time: "%START_TIME%"
21+
protocol: "%PROTOCOL%"
22+
request_method: "%REQ(:METHOD)%"
23+
request_path: "%REQ(X-ENVOY-ORIGINAL-PATH?:PATH)%"
24+
response_code: "%RESPONSE_CODE%"
25+
response_flags: "%RESPONSE_FLAGS%"
26+
bytes_received: "%BYTES_RECEIVED%"
27+
bytes_sent: "%BYTES_SENT%"
28+
duration: "%DURATION%"
29+
upstream_host: "%UPSTREAM_HOST%"
30+
upstream_cluster: "%UPSTREAM_CLUSTER%"
31+
upstream_local_address: "%UPSTREAM_LOCAL_ADDRESS%"
32+
request_id: "%REQ(X-REQUEST-ID)%"
33+
selected_model: "%REQ(X-SELECTED-MODEL)%"
34+
selected_endpoint: "%REQ(X-SEMANTIC-DESTINATION-ENDPOINT)%"
35+
route_config:
36+
name: local_route
37+
virtual_hosts:
38+
- name: local_service
39+
domains: ["*"]
40+
routes:
41+
# Single route using original destination cluster
42+
- match:
43+
prefix: "/"
44+
route:
45+
cluster: vllm_dynamic_cluster
46+
timeout: 300s
47+
http_filters:
48+
- name: envoy.filters.http.ext_proc
49+
typed_config:
50+
"@type": type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor
51+
grpc_service:
52+
envoy_grpc:
53+
cluster_name: extproc_service
54+
allow_mode_override: true
55+
processing_mode:
56+
request_header_mode: "SEND"
57+
response_header_mode: "SEND"
58+
request_body_mode: "BUFFERED"
59+
response_body_mode: "BUFFERED"
60+
request_trailer_mode: "SKIP"
61+
response_trailer_mode: "SKIP"
62+
failure_mode_allow: true
63+
message_timeout: 300s
64+
- name: envoy.filters.http.router
65+
typed_config:
66+
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
67+
suppress_envoy_headers: true
68+
http2_protocol_options:
69+
max_concurrent_streams: 100
70+
initial_stream_window_size: 65536
71+
initial_connection_window_size: 1048576
72+
stream_idle_timeout: "300s"
73+
request_timeout: "300s"
74+
common_http_protocol_options:
75+
idle_timeout: "300s"
76+
77+
clusters:
78+
- name: extproc_service
79+
connect_timeout: 300s
80+
per_connection_buffer_limit_bytes: 52428800
81+
type: STATIC
82+
lb_policy: ROUND_ROBIN
83+
typed_extension_protocol_options:
84+
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
85+
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
86+
explicit_http_config:
87+
http2_protocol_options:
88+
connection_keepalive:
89+
interval: 300s
90+
timeout: 300s
91+
load_assignment:
92+
cluster_name: extproc_service
93+
endpoints:
94+
- lb_endpoints:
95+
- endpoint:
96+
address:
97+
socket_address:
98+
address: semantic-router # Use Docker service name
99+
port_value: 50051
100+
101+
# Dynamic vLLM cluster using original destination
102+
- name: vllm_dynamic_cluster
103+
connect_timeout: 300s
104+
per_connection_buffer_limit_bytes: 52428800
105+
type: ORIGINAL_DST
106+
lb_policy: CLUSTER_PROVIDED
107+
original_dst_lb_config:
108+
use_http_header: true
109+
http_header_name: "x-semantic-destination-endpoint"
110+
typed_extension_protocol_options:
111+
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
112+
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
113+
explicit_http_config:
114+
http_protocol_options: {}
115+
116+
admin:
117+
address:
118+
socket_address:
119+
address: "0.0.0.0"
120+
port_value: 19000

deploy/kubernetes/config.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,22 +28,22 @@ gpu_config:
2828
# vLLM Endpoints Configuration - supports multiple endpoints, each can serve multiple models
2929
vllm_endpoints:
3030
- name: "endpoint1"
31-
address: "192.168.12.90"
31+
address: "127.0.0.1"
3232
port: 11434
3333
models:
3434
- "phi4"
3535
- "gemma3:27b"
3636
weight: 1 # Load balancing weight
3737
health_check_path: "/health" # Optional health check endpoint
3838
- name: "endpoint2"
39-
address: "192.168.12.91"
39+
address: "127.0.0.1"
4040
port: 11434
4141
models:
4242
- "mistral-small3.1"
4343
weight: 1
4444
health_check_path: "/health"
4545
- name: "endpoint3"
46-
address: "192.168.12.92"
46+
address: "127.0.0.1"
4747
port: 11434
4848
models:
4949
- "phi4" # Same model can be served by multiple endpoints for redundancy

docker-compose.yml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
version: '3.8'
2+
3+
services:
4+
# Semantic Router External Processor Service
5+
semantic-router:
6+
build:
7+
context: .
8+
dockerfile: Dockerfile.extproc
9+
container_name: semantic-router
10+
ports:
11+
- "50051:50051"
12+
volumes:
13+
- ./config:/app/config:ro
14+
- ./models:/app/models:ro
15+
environment:
16+
- LD_LIBRARY_PATH=/app/lib
17+
networks:
18+
- semantic-network
19+
healthcheck:
20+
test: ["CMD", "nc", "-z", "localhost", "50051"]
21+
interval: 10s
22+
timeout: 5s
23+
retries: 5
24+
start_period: 30s
25+
26+
# Envoy Proxy Service
27+
envoy:
28+
image: envoyproxy/envoy:v1.31.7
29+
container_name: envoy-proxy
30+
ports:
31+
- "8801:8801" # Main proxy port
32+
- "19000:19000" # Admin interface
33+
volumes:
34+
- ./config/envoy-docker.yaml:/etc/envoy/envoy.yaml:ro
35+
command: ["/usr/local/bin/envoy", "-c", "/etc/envoy/envoy.yaml", "--component-log-level", "ext_proc:trace,router:trace,http:trace"]
36+
depends_on:
37+
semantic-router:
38+
condition: service_healthy
39+
networks:
40+
- semantic-network
41+
healthcheck:
42+
test: ["CMD", "wget", "--no-verbose", "--tries=1", "--spider", "http://localhost:19000/ready"]
43+
interval: 10s
44+
timeout: 5s
45+
retries: 5
46+
start_period: 10s
47+
48+
networks:
49+
semantic-network:
50+
driver: bridge
51+
52+
volumes:
53+
models-cache:
54+
driver: local

docker/README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
# Docker Compose Quick Start Guide
2+
3+
This Docker Compose configuration allows you to quickly run Semantic Router + Envoy proxy locally.
4+
5+
## Prerequisites
6+
7+
- Docker and Docker Compose
8+
- Ensure ports 8801, 50051, 19000 are not in use
9+
10+
## Install in Docker Compose
11+
12+
1. **Clone the repository and navigate to the project directory**
13+
```bash
14+
git clone <repository-url>
15+
cd semantic_router
16+
```
17+
18+
2. **Download required models** (if not already present):
19+
```bash
20+
make download-models
21+
```
22+
This will download the necessary ML models for classification:
23+
- Category classifier (ModernBERT-base)
24+
- PII classifier (ModernBERT-base)
25+
- Jailbreak classifier (ModernBERT-base)
26+
27+
3. **Start the services using Docker Compose**
28+
```bash
29+
# Start core services (semantic-router + envoy)
30+
docker-compose up --build
31+
32+
# Or run in background
33+
docker-compose up --build -d
34+
35+
# Start with testing services (includes mock vLLM)
36+
docker-compose --profile testing up --build
37+
```
38+
39+
4. **Verify the installation**
40+
- Semantic Router: http://localhost:50051 (gRPC service)
41+
- Envoy Proxy: http://localhost:8801 (main endpoint)
42+
- Envoy Admin: http://localhost:19000 (admin interface)
43+
44+
## Quick Start
45+
46+
### 1. Build and Start Services
47+
48+
```bash
49+
# Start core services (semantic-router + envoy)
50+
docker-compose up --build
51+
52+
# Or run in background
53+
docker-compose up --build -d
54+
```

0 commit comments

Comments
 (0)