Skip to content

Commit 15a7437

Browse files
authored
Merge branch 'main' into feature/classifier-test-framework
2 parents 39911f4 + 80f062b commit 15a7437

File tree

29 files changed

+2466
-289
lines changed

29 files changed

+2466
-289
lines changed

CONTRIBUTING.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Thank you for your interest in contributing to the vLLM Semantic Router project!
1010
- [Running Tests](#running-tests)
1111
- [Development Workflow](#development-workflow)
1212
- [Code Style and Standards](#code-style-and-standards)
13+
- [Code Quality Checks](#code-quality-checks)
1314
- [Submitting Changes](#submitting-changes)
1415
- [Project Structure](#project-structure)
1516

@@ -29,7 +30,7 @@ Before you begin, ensure you have the following installed:
2930

3031
1. **Clone the repository:**
3132
```bash
32-
git clone <repository-url>
33+
git clone https://github.com/vllm-project/semantic-router.git
3334
cd semantic-router
3435
```
3536

@@ -191,6 +192,31 @@ The test suite includes:
191192

192193
## Code Style and Standards
193194

195+
### Code Quality Checks
196+
197+
Before submitting a PR, please run the pre-commit hooks to ensure code quality and consistency. **These checks are mandatory** and will be automatically run on every commit once installed.
198+
199+
**Step 1: Install pre-commit tool**
200+
```bash
201+
# Using pip (recommended)
202+
pip install pre-commit
203+
204+
# Or using conda
205+
conda install -c conda-forge pre-commit
206+
207+
# Or using homebrew (macOS)
208+
brew install pre-commit
209+
```
210+
211+
**Step 2: Install pre-commit hooks for this repository**
212+
```bash
213+
# Install pre-commit hooks
214+
pre-commit install
215+
216+
# Run all checks
217+
pre-commit run --all-files
218+
```
219+
194220
### Go Code
195221
- Follow standard Go formatting (`gofmt`)
196222
- Use meaningful variable and function names

Makefile

Lines changed: 20 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,20 +11,35 @@ build: rust build-router
1111

1212
# Build the Rust library
1313
rust:
14-
@echo "Building Rust library..."
15-
cd candle-binding && cargo build --release
14+
@echo "Ensuring rust is installed..."
15+
@bash -c 'if ! command -v rustc >/dev/null 2>&1; then \
16+
echo "rustc not found, installing..."; \
17+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y; \
18+
fi && \
19+
if [ -f "$$HOME/.cargo/env" ]; then \
20+
echo "Loading Rust environment from $$HOME/.cargo/env..." && \
21+
. $$HOME/.cargo/env; \
22+
fi && \
23+
if ! command -v cargo >/dev/null 2>&1; then \
24+
echo "Error: cargo not found in PATH" && exit 1; \
25+
fi && \
26+
echo "Building Rust library..." && \
27+
cd candle-binding && cargo build --release'
1628

1729
# Build router
1830
build-router: rust
1931
@echo "Building router..."
2032
@mkdir -p bin
2133
@cd src/semantic-router && go build -o ../../bin/router cmd/main.go
2234

35+
# Config file path with default
36+
CONFIG_FILE ?= config/config.yaml
37+
2338
# Run the router
24-
run-router: build-router
25-
@echo "Running router..."
39+
run-router: build-router download-models
40+
@echo "Running router with config: ${CONFIG_FILE}"
2641
@export LD_LIBRARY_PATH=${PWD}/candle-binding/target/release && \
27-
./bin/router -config=config/config.yaml
42+
./bin/router -config=${CONFIG_FILE}
2843

2944
# Prepare Envoy
3045
prepare-envoy:

README.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
<div align="center">
22

3-
<img src="website/static/img/repo.png" alt="vLLM Semantic Router"/>
3+
<img src="website/static/img/repo.png" alt="vLLM Semantic Router" width="80%"/>
44

55
[![Documentation](https://img.shields.io/badge/docs-read%20the%20docs-blue)](https://vllm-semantic-router.com)
66
[![Hugging Face](https://img.shields.io/badge/🤗%20Hugging%20Face-Community-yellow)](https://huggingface.co/LLM-Semantic-Router)
@@ -81,3 +81,9 @@ If you find Semantic Router helpful in your research or projects, please conside
8181
howpublished={\url{https://github.com/vllm-project/semantic-router}},
8282
}
8383
```
84+
85+
## Star History 🔥
86+
87+
We opened the project at Aug 31, 2025. We love open source and collaboration ❤️
88+
89+
[![Star History Chart](https://api.star-history.com/svg?repos=vllm-project/semantic-router&type=Date)](https://www.star-history.com/#vllm-project/semantic-router&Date)

bench/run_bench.sh

100644100755
Lines changed: 83 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,88 @@
11
#!/bin/bash
22

3-
set -x
3+
# Example usage:
4+
# Quick run:
5+
# SAMPLES_PER_CATEGORY=5 CONCURRENT_REQUESTS=4 VLLM_MODELS="openai/gpt-oss-20b" ROUTER_MODELS="auto" ./run_bench.sh
6+
# Long run:
7+
# SAMPLES_PER_CATEGORY=100 CONCURRENT_REQUESTS=4 VLLM_MODELS="openai/gpt-oss-20b" ROUTER_MODELS="auto" ./run_bench.sh
8+
# To test only router:
9+
# BENCHMARK_ROUTER_ONLY=true ./run_bench.sh
410

5-
export ROUTER_API_KEY="1234567890"
6-
export VLLM_API_KEY="1234567890"
7-
export ROUTER_ENDPOINT="http://localhost:8801/v1"
8-
export VLLM_ENDPOINT="http://localhost:8000/v1"
9-
export ROUTER_MODELS="auto"
10-
export VLLM_MODELS="openai/gpt-oss-20b"
11+
set -x -e
12+
13+
export ROUTER_API_KEY="${ROUTER_API_KEY:-1234567890}"
14+
export VLLM_API_KEY="${VLLM_API_KEY:-1234567890}"
15+
export ROUTER_ENDPOINT="${ROUTER_ENDPOINT:-http://localhost:8801/v1}"
16+
export VLLM_ENDPOINT="${VLLM_ENDPOINT:-http://localhost:8000/v1}"
17+
export ROUTER_MODELS="${ROUTER_MODELS:-auto}"
18+
export VLLM_MODELS="${VLLM_MODELS:-openai/gpt-oss-20b}"
19+
export SAMPLES_PER_CATEGORY="${SAMPLES_PER_CATEGORY:-5}"
20+
export CONCURRENT_REQUESTS="${CONCURRENT_REQUESTS:-4}"
21+
export BENCHMARK_ROUTER_ONLY="${BENCHMARK_ROUTER_ONLY:-false}"
1122

1223
# Run the benchmark
13-
python router_reason_bench.py \
14-
--run-router \
15-
--router-endpoint "$ROUTER_ENDPOINT" \
16-
--router-api-key "$ROUTER_API_KEY" \
17-
--router-models "$ROUTER_MODELS" \
18-
--run-vllm \
19-
--vllm-endpoint "$VLLM_ENDPOINT" \
20-
--vllm-api-key "$VLLM_API_KEY" \
21-
--vllm-models "$VLLM_MODELS" \
22-
--samples-per-category 5 \
23-
--vllm-exec-modes NR XC \
24-
--concurrent-requests 4 \
25-
--output-dir results/reasonbench
26-
27-
# Generate plots
28-
VLLM_MODEL_FIRST="${VLLM_MODELS%% *}"
29-
ROUTER_MODEL_FIRST="${ROUTER_MODELS%% *}"
30-
VLLM_MODELS_SAFE="${VLLM_MODEL_FIRST//\//_}"
31-
ROUTER_MODELS_SAFE="${ROUTER_MODEL_FIRST//\//_}"
32-
python bench_plot.py \
33-
--summary "results/reasonbench/vllm::${VLLM_MODELS_SAFE}/summary.json" \
34-
--router-summary "results/reasonbench/router::${ROUTER_MODELS_SAFE}/summary.json"
24+
if [ "${BENCHMARK_ROUTER_ONLY}" = "true" ]; then
25+
echo "Running router-only benchmark"
26+
python bench/router_reason_bench.py \
27+
--run-router \
28+
--router-endpoint "$ROUTER_ENDPOINT" \
29+
--router-api-key "$ROUTER_API_KEY" \
30+
--router-models "$ROUTER_MODELS" \
31+
--samples-per-category "$SAMPLES_PER_CATEGORY" \
32+
--concurrent-requests "$CONCURRENT_REQUESTS" \
33+
--output-dir results/reasonbench
34+
else
35+
echo "Running full benchmark (router + vLLM)..."
36+
python bench/router_reason_bench.py \
37+
--run-router \
38+
--router-endpoint "$ROUTER_ENDPOINT" \
39+
--router-api-key "$ROUTER_API_KEY" \
40+
--router-models "$ROUTER_MODELS" \
41+
--run-vllm \
42+
--vllm-endpoint "$VLLM_ENDPOINT" \
43+
--vllm-api-key "$VLLM_API_KEY" \
44+
--vllm-models "$VLLM_MODELS" \
45+
--samples-per-category "$SAMPLES_PER_CATEGORY" \
46+
--vllm-exec-modes NR XC \
47+
--concurrent-requests "$CONCURRENT_REQUESTS" \
48+
--output-dir results/reasonbench
49+
fi
50+
51+
# Generate plots if summary files exist
52+
echo "Checking for plot generation..."
53+
echo "VLLM_MODELS: $VLLM_MODELS"
54+
echo "ROUTER_MODELS: $ROUTER_MODELS"
55+
56+
# Get first model name and make it path-safe
57+
VLLM_MODEL_FIRST=$(echo "$VLLM_MODELS" | cut -d' ' -f1)
58+
ROUTER_MODEL_FIRST=$(echo "$ROUTER_MODELS" | cut -d' ' -f1)
59+
echo "First models: VLLM=$VLLM_MODEL_FIRST, Router=$ROUTER_MODEL_FIRST"
60+
61+
# Replace / with _ for path safety
62+
VLLM_MODELS_SAFE=$(echo "$VLLM_MODEL_FIRST" | tr '/' '_')
63+
ROUTER_MODELS_SAFE=$(echo "$ROUTER_MODEL_FIRST" | tr '/' '_')
64+
echo "Safe paths: VLLM=$VLLM_MODELS_SAFE, Router=$ROUTER_MODELS_SAFE"
65+
66+
# Construct the full paths
67+
VLLM_SUMMARY="results/reasonbench/vllm::${VLLM_MODELS_SAFE}/summary.json"
68+
ROUTER_SUMMARY="results/reasonbench/router::${ROUTER_MODELS_SAFE}/summary.json"
69+
echo "Looking for summaries at:"
70+
echo "VLLM: $VLLM_SUMMARY"
71+
echo "Router: $ROUTER_SUMMARY"
72+
73+
# Check if at least one summary file exists and generate plots
74+
if [ -f "$ROUTER_SUMMARY" ]; then
75+
echo "Found router summary, generating plots..."
76+
if [ -f "$VLLM_SUMMARY" ]; then
77+
echo "Found both summaries, generating comparison plots..."
78+
python bench/bench_plot.py \
79+
--summary "$VLLM_SUMMARY" \
80+
--router-summary "$ROUTER_SUMMARY"
81+
else
82+
echo "vLLM summary not found, generating router-only plots..."
83+
python bench/bench_plot.py \
84+
--router-summary "$ROUTER_SUMMARY"
85+
fi
86+
else
87+
echo "No router summary found, skipping plot generation"
88+
fi

config/config.yaml

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -237,4 +237,22 @@ categories:
237237
- model: phi4
238238
score: 0.2
239239
default_model: mistral-small3.1
240-
default_reasoning_effort: medium # Default reasoning effort level (low, medium, high)
240+
default_reasoning_effort: medium # Default reasoning effort level (low, medium, high)
241+
242+
# API Configuration
243+
api:
244+
batch_classification:
245+
max_batch_size: 100 # Maximum number of texts in a single batch
246+
concurrency_threshold: 5 # Switch to concurrent processing when batch size > this value
247+
max_concurrency: 8 # Maximum number of concurrent goroutines
248+
249+
# Metrics configuration for monitoring batch classification performance
250+
metrics:
251+
enabled: true # Enable comprehensive metrics collection
252+
detailed_goroutine_tracking: true # Track individual goroutine lifecycle
253+
high_resolution_timing: false # Use nanosecond precision timing
254+
sample_rate: 1.0 # Collect metrics for all requests (1.0 = 100%, 0.5 = 50%)
255+
256+
# Histogram buckets for metrics (directly configure what you need)
257+
duration_buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30]
258+
size_buckets: [1, 2, 5, 10, 20, 50, 100, 200]

0 commit comments

Comments
 (0)