|
| 1 | +# CLAUDE.md |
| 2 | + |
| 3 | +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | + |
| 7 | +vLLM Semantic Router is a intelligent routing system that uses BERT-based semantic classification to select the optimal model for LLM requests. The system consists of a Rust library for ML inference and a Go service implementing the Envoy ExtProc interface for request routing. |
| 8 | + |
| 9 | +## Build Commands |
| 10 | + |
| 11 | +### Essential Build Commands |
| 12 | +```bash |
| 13 | +# Build everything (Rust library + Go router) |
| 14 | +make build |
| 15 | + |
| 16 | +# Build only Rust library (candle-binding) |
| 17 | +make rust |
| 18 | + |
| 19 | +# Build only Go router |
| 20 | +make build-router |
| 21 | + |
| 22 | +# Download required models from Hugging Face |
| 23 | +make download-models |
| 24 | + |
| 25 | +# Clean all build artifacts |
| 26 | +make clean |
| 27 | +``` |
| 28 | + |
| 29 | +### Running the System |
| 30 | +```bash |
| 31 | +# Run the semantic router (requires models downloaded) |
| 32 | +make run-router |
| 33 | + |
| 34 | +# Run Envoy proxy (separate terminal) |
| 35 | +make run-envoy |
| 36 | + |
| 37 | +# Use custom config file |
| 38 | +CONFIG_FILE=custom.yaml make run-router |
| 39 | +``` |
| 40 | + |
| 41 | +## Testing Commands |
| 42 | + |
| 43 | +### Core Testing |
| 44 | +```bash |
| 45 | +# Run all tests (includes go vet, go mod tidy checks, and unit tests) |
| 46 | +make test |
| 47 | + |
| 48 | +# Test individual components |
| 49 | +make test-binding # Test Rust bindings |
| 50 | +make test-semantic-router # Test Go router |
| 51 | +make test-category-classifier # Test category classification |
| 52 | +make test-pii-classifier # Test PII detection |
| 53 | +make test-jailbreak-classifier # Test jailbreak detection |
| 54 | +``` |
| 55 | + |
| 56 | +### Manual Testing (requires services running) |
| 57 | +```bash |
| 58 | +# Test different routing scenarios |
| 59 | +make test-auto-prompt-reasoning # Test reasoning mode |
| 60 | +make test-auto-prompt-no-reasoning # Test normal mode |
| 61 | +make test-pii # Test PII detection |
| 62 | +make test-prompt-guard # Test jailbreak detection |
| 63 | +make test-tools # Test tool auto-selection |
| 64 | +``` |
| 65 | + |
| 66 | +### Milvus Cache Testing |
| 67 | +```bash |
| 68 | +# Start Milvus container |
| 69 | +make start-milvus |
| 70 | + |
| 71 | +# Test with Milvus backend |
| 72 | +make test-milvus-cache |
| 73 | +make test-semantic-router-milvus |
| 74 | + |
| 75 | +# Stop Milvus when done |
| 76 | +make stop-milvus |
| 77 | +``` |
| 78 | + |
| 79 | +### End-to-End Testing |
| 80 | +```bash |
| 81 | +# Start services first |
| 82 | +make run-envoy & |
| 83 | +make run-router & |
| 84 | + |
| 85 | +# Run comprehensive e2e tests |
| 86 | +python e2e-tests/run_all_tests.py |
| 87 | + |
| 88 | +# Run specific tests |
| 89 | +python e2e-tests/00-client-request-test.py |
| 90 | +``` |
| 91 | + |
| 92 | +## Code Quality |
| 93 | + |
| 94 | +### Pre-commit Hooks |
| 95 | +```bash |
| 96 | +# Install pre-commit hooks (mandatory for contributions) |
| 97 | +pip install pre-commit |
| 98 | +pre-commit install |
| 99 | + |
| 100 | +# Run all pre-commit checks |
| 101 | +pre-commit run --all-files |
| 102 | +``` |
| 103 | + |
| 104 | +### Go Module Management |
| 105 | +```bash |
| 106 | +# Keep Go modules tidy (checked by CI) |
| 107 | +cd candle-binding && go mod tidy |
| 108 | +cd src/semantic-router && go mod tidy |
| 109 | +``` |
| 110 | + |
| 111 | +## Architecture |
| 112 | + |
| 113 | +### High-Level Components |
| 114 | +- **Candle Binding**: Rust library using the [candle](https://github.com/huggingface/candle) ML framework for BERT-based classification |
| 115 | +- **Semantic Router**: Go service implementing Envoy ExtProc interface for intelligent request routing |
| 116 | +- **Configuration**: YAML-based configuration for models, endpoints, and routing rules |
| 117 | + |
| 118 | +### Core Classification Models |
| 119 | +- **Category Classifier**: Routes requests to appropriate models based on content domain (math, science, law, etc.) |
| 120 | +- **PII Classifier**: Detects and blocks personally identifiable information |
| 121 | +- **Jailbreak Classifier**: Identifies and blocks prompt injection attempts |
| 122 | + |
| 123 | +### Semantic Caching |
| 124 | +- **Memory Backend**: Fast in-memory cache for development |
| 125 | +- **Milvus Backend**: Scalable vector database for production deployments |
| 126 | + |
| 127 | +### Directory Structure |
| 128 | +``` |
| 129 | +├── candle-binding/ # Rust ML library with BERT classification |
| 130 | +├── src/semantic-router/ # Go router service (Envoy ExtProc) |
| 131 | +├── src/training/ # Model training and fine-tuning scripts |
| 132 | +├── config/ # Configuration files (config.yaml, etc.) |
| 133 | +├── e2e-tests/ # End-to-end test suite |
| 134 | +├── models/ # Downloaded classification models |
| 135 | +└── website/ # Documentation website |
| 136 | +``` |
| 137 | + |
| 138 | +### Key Configuration Files |
| 139 | +- `config/config.yaml`: Main configuration for models, endpoints, and routing rules |
| 140 | +- `config/tools_db.json`: Tool selection database |
| 141 | +- `config/cache/milvus.yaml`: Milvus vector database configuration |
| 142 | + |
| 143 | +## Development Environment Setup |
| 144 | + |
| 145 | +### Prerequisites |
| 146 | +- Rust (latest stable) |
| 147 | +- Go 1.24.1+ |
| 148 | +- Hugging Face CLI (`pip install huggingface_hub`) |
| 149 | +- Make |
| 150 | +- Python 3.8+ (for training and e2e tests) |
| 151 | + |
| 152 | +### Initial Setup |
| 153 | +```bash |
| 154 | +# Clone and download models |
| 155 | +git clone https://github.com/vllm-project/semantic-router.git |
| 156 | +cd semantic-router |
| 157 | +make download-models |
| 158 | + |
| 159 | +# Install Python dependencies (optional) |
| 160 | +pip install -r requirements.txt |
| 161 | +pip install -r e2e-tests/requirements.txt |
| 162 | +``` |
| 163 | + |
| 164 | +## Documentation |
| 165 | + |
| 166 | +### Documentation Development |
| 167 | +```bash |
| 168 | +# Start documentation dev server |
| 169 | +make docs-dev |
| 170 | + |
| 171 | +# Build documentation for production |
| 172 | +make docs-build |
| 173 | + |
| 174 | +# Lint documentation |
| 175 | +make docs-lint |
| 176 | +``` |
| 177 | + |
| 178 | +## Environment Variables |
| 179 | + |
| 180 | +- `LD_LIBRARY_PATH`: Must include `${PWD}/candle-binding/target/release` for Rust library loading |
| 181 | +- `CONFIG_FILE`: Path to configuration file (default: `config/config.yaml`) |
| 182 | +- `CONTAINER_RUNTIME`: Container runtime for Milvus (`docker` or `podman`) |
| 183 | +- `VLLM_ENDPOINT`: vLLM endpoint URL for testing |
| 184 | +- `SKIP_MILVUS_TESTS`: Skip Milvus-dependent tests (default: `true`) |
| 185 | + |
| 186 | +## Important Notes |
| 187 | + |
| 188 | +- Always run `make download-models` before first build |
| 189 | +- The system requires both Envoy and the router to be running for end-to-end functionality |
| 190 | +- Use `make test` before submitting changes to ensure all quality checks pass |
| 191 | +- For production deployments, consider using Milvus backend for semantic caching |
0 commit comments