Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions Dockerfile.extproc
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,10 @@ RUN mkdir -p src/semantic-router
COPY src/semantic-router/go.mod src/semantic-router/go.sum src/semantic-router/
COPY candle-binding/go.mod candle-binding/semantic-router.go candle-binding/

# Pre-download modules to fail fast if mirrors are unreachable
RUN cd src/semantic-router && go mod download && \
cd /app/candle-binding && go mod download

# Copy semantic-router source code
COPY src/semantic-router/ src/semantic-router/

Expand Down
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,36 @@ This command will:

For detailed installation and configuration instructions, see the [Complete Documentation](https://vllm-semantic-router.com/docs/installation/).

### What This Starts By Default

`make docker-compose-up` now launches the full stack including a lightweight local OpenAI-compatible model server powered by **llm-katan** (serving the small model `Qwen/Qwen3-0.6B` under the alias `qwen3`). The semantic router is configured to route classification & default generations to this local endpoint out-of-the-box. This gives you an entirely self-contained experience (no external API keys required) while still letting you add remote / larger models later.

### Core Mode (Without Local Model)

If you only want the core semantic-router + Envoy + observability stack (and will point to external OpenAI-compatible endpoints yourself):

```bash
make docker-compose-up-core
```

### Prerequisite Model Download (Speeds Up First Run)

The existing model bootstrap targets now also pre-download the small llm-katan model so the first `docker-compose-up` avoids an on-demand Hugging Face fetch.

Minimal set (fast):

```bash
make models-download-minimal
```

Full set:

```bash
make models-download
```

Both create a stamp file once `Qwen/Qwen3-0.6B` is present to keep subsequent runs idempotent.

## Documentation 📖

For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
Expand Down
38 changes: 19 additions & 19 deletions config/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,13 +32,13 @@ prompt_guard:
# NOT supported: domain names (example.com), protocol prefixes (http://), paths (/api), ports in address (use 'port' field)
vllm_endpoints:
- name: "endpoint1"
address: "127.0.0.1" # IPv4 address - REQUIRED format
port: 8000
address: "172.28.0.20" # Static IPv4 of llm-katan within docker compose network
port: 8002
weight: 1

model_config:
"openai/gpt-oss-20b":
reasoning_family: "gpt-oss" # This model uses GPT-OSS reasoning syntax
"qwen3":
reasoning_family: "qwen3" # This model uses Qwen-3 reasoning syntax
preferred_endpoints: ["endpoint1"]
pii_policy:
allow_by_default: true
Expand All @@ -63,89 +63,89 @@ categories:
- name: business
system_prompt: "You are a senior business consultant and strategic advisor with expertise in corporate strategy, operations management, financial analysis, marketing, and organizational development. Provide practical, actionable business advice backed by proven methodologies and industry best practices. Consider market dynamics, competitive landscape, and stakeholder interests in your recommendations."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.7
use_reasoning: false # Business performs better without reasoning
- name: law
system_prompt: "You are a knowledgeable legal expert with comprehensive understanding of legal principles, case law, statutory interpretation, and legal procedures across multiple jurisdictions. Provide accurate legal information and analysis while clearly stating that your responses are for informational purposes only and do not constitute legal advice. Always recommend consulting with qualified legal professionals for specific legal matters."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.4
use_reasoning: false
- name: psychology
system_prompt: "You are a psychology expert with deep knowledge of cognitive processes, behavioral patterns, mental health, developmental psychology, social psychology, and therapeutic approaches. Provide evidence-based insights grounded in psychological research and theory. When discussing mental health topics, emphasize the importance of professional consultation and avoid providing diagnostic or therapeutic advice."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.6
use_reasoning: false
- name: biology
system_prompt: "You are a biology expert with comprehensive knowledge spanning molecular biology, genetics, cell biology, ecology, evolution, anatomy, physiology, and biotechnology. Explain biological concepts with scientific accuracy, use appropriate terminology, and provide examples from current research. Connect biological principles to real-world applications and emphasize the interconnectedness of biological systems."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.9
use_reasoning: false
- name: chemistry
system_prompt: "You are a chemistry expert specializing in chemical reactions, molecular structures, and laboratory techniques. Provide detailed, step-by-step explanations."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.6
use_reasoning: true # Enable reasoning for complex chemistry
- name: history
system_prompt: "You are a historian with expertise across different time periods and cultures. Provide accurate historical context and analysis."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.7
use_reasoning: false
- name: other
system_prompt: "You are a helpful and knowledgeable assistant. Provide accurate, helpful responses across a wide range of topics."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.7
use_reasoning: false
- name: health
system_prompt: "You are a health and medical information expert with knowledge of anatomy, physiology, diseases, treatments, preventive care, nutrition, and wellness. Provide accurate, evidence-based health information while emphasizing that your responses are for educational purposes only and should never replace professional medical advice, diagnosis, or treatment. Always encourage users to consult healthcare professionals for medical concerns and emergencies."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.5
use_reasoning: false
- name: economics
system_prompt: "You are an economics expert with deep understanding of microeconomics, macroeconomics, econometrics, financial markets, monetary policy, fiscal policy, international trade, and economic theory. Analyze economic phenomena using established economic principles, provide data-driven insights, and explain complex economic concepts in accessible terms. Consider both theoretical frameworks and real-world applications in your responses."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 1.0
use_reasoning: false
- name: math
system_prompt: "You are a mathematics expert. Provide step-by-step solutions, show your work clearly, and explain mathematical concepts in an understandable way."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 1.0
use_reasoning: true # Enable reasoning for complex math
- name: physics
system_prompt: "You are a physics expert with deep understanding of physical laws and phenomena. Provide clear explanations with mathematical derivations when appropriate."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.7
use_reasoning: true # Enable reasoning for physics
- name: computer science
system_prompt: "You are a computer science expert with knowledge of algorithms, data structures, programming languages, and software engineering. Provide clear, practical solutions with code examples when helpful."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.6
use_reasoning: false
- name: philosophy
system_prompt: "You are a philosophy expert with comprehensive knowledge of philosophical traditions, ethical theories, logic, metaphysics, epistemology, political philosophy, and the history of philosophical thought. Engage with complex philosophical questions by presenting multiple perspectives, analyzing arguments rigorously, and encouraging critical thinking. Draw connections between philosophical concepts and contemporary issues while maintaining intellectual honesty about the complexity and ongoing nature of philosophical debates."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.5
use_reasoning: false
- name: engineering
system_prompt: "You are an engineering expert with knowledge across multiple engineering disciplines including mechanical, electrical, civil, chemical, software, and systems engineering. Apply engineering principles, design methodologies, and problem-solving approaches to provide practical solutions. Consider safety, efficiency, sustainability, and cost-effectiveness in your recommendations. Use technical precision while explaining concepts clearly, and emphasize the importance of proper engineering practices and standards."
model_scores:
- model: openai/gpt-oss-20b
- model: qwen3
score: 0.7
use_reasoning: false

default_model: openai/gpt-oss-20b
default_model: "qwen3"

# Reasoning family configurations
reasoning_families:
Expand Down
25 changes: 19 additions & 6 deletions deploy/docker-compose/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,12 @@ services:
volumes:
- ../../config:/app/config:ro
- ../../models:/app/models:ro
- ~/.cache/huggingface:/root/.cache/huggingface
environment:
- LD_LIBRARY_PATH=/app/lib
- CONFIG_FILE=${CONFIG_FILE:-/app/config/config.yaml}
- HUGGINGFACE_HUB_CACHE=/root/.cache/huggingface
- HF_HUB_ENABLE_HF_TRANSFER=1
networks:
- semantic-network
healthcheck:
Expand Down Expand Up @@ -134,18 +137,27 @@ services:

# LLM Katan service for testing
llm-katan:
build:
context: ../../e2e-tests/llm-katan
dockerfile: Dockerfile
image: ${LLM_KATAN_IMAGE:-ghcr.io/vllm-project/semantic-router/llm-katan:latest}
container_name: llm-katan
profiles: ["testing", "llm-katan"]
ports:
- "8002:8000"
- "8002:8002"
environment:
- HUGGINGFACE_HUB_TOKEN=${HUGGINGFACE_HUB_TOKEN:-}
- HF_HUB_ENABLE_HF_TRANSFER=1
volumes:
- ../../models:/app/models:ro
- hf-cache:/home/llmkatan/.cache/huggingface
networks:
- semantic-network
command: ["llm-katan", "--model", "Qwen/Qwen3-0.6B", "--host", "0.0.0.0", "--port", "8000"]
semantic-network:
ipv4_address: 172.28.0.20
command: ["llm-katan", "--model", "/app/models/Qwen/Qwen3-0.6B", "--served-model-name", "qwen3", "--host", "0.0.0.0", "--port", "8002"]
healthcheck:
test: ["CMD", "curl", "-fsS", "http://localhost:8002/health"]
interval: 10s
timeout: 5s
retries: 5
start_period: 10s

# Semantic Router Dashboard
dashboard:
Expand Down Expand Up @@ -202,3 +214,4 @@ volumes:
grafana-data:
openwebui-data:
openwebui-pipelines:
hf-cache:
13 changes: 10 additions & 3 deletions tools/make/docker.mk
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,8 @@ BUILD_FLAG=$(if $(REBUILD),--build,)
# Docker compose shortcuts (no rebuild by default)
docker-compose-up:
@$(LOG_TARGET)
@echo "Starting services with docker-compose (REBUILD=$(REBUILD))..."
@docker compose up -d $(BUILD_FLAG)
@echo "Starting services with docker-compose (default includes llm-katan) (REBUILD=$(REBUILD))..."
@docker compose --profile llm-katan up -d $(BUILD_FLAG)

docker-compose-up-testing:
@$(LOG_TARGET)
Expand All @@ -112,6 +112,12 @@ docker-compose-up-llm-katan:
@echo "Starting services with llm-katan profile (REBUILD=$(REBUILD))..."
@docker compose --profile llm-katan up -d $(BUILD_FLAG)

# Start core services only (closer to production; excludes llm-katan)
docker-compose-up-core:
@$(LOG_TARGET)
@echo "Starting core services (no llm-katan) (REBUILD=$(REBUILD))..."
@docker compose up -d $(BUILD_FLAG)

# Explicit rebuild targets for convenience
docker-compose-rebuild: REBUILD=1
docker-compose-rebuild: docker-compose-up
Expand Down Expand Up @@ -139,7 +145,8 @@ docker-help:
@echo " docker-run-llm-katan - Run llm-katan Docker image locally"
@echo " docker-run-llm-katan-custom SERVED_NAME=name - Run with custom served model name"
@echo " docker-clean - Clean up Docker images"
@echo " docker-compose-up - Start services (add REBUILD=1 to rebuild)"
@echo " docker-compose-up - Start services (default includes llm-katan; REBUILD=1 to rebuild)"
@echo " docker-compose-up-core - Start core services only (no llm-katan)"
@echo " docker-compose-up-testing - Start with testing profile (REBUILD=1 optional)"
@echo " docker-compose-up-llm-katan - Start with llm-katan profile (REBUILD=1 optional)"
@echo " docker-compose-rebuild - Force rebuild then start"
Expand Down
13 changes: 11 additions & 2 deletions tools/make/linter.mk
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,20 @@ docs-lint-fix: docs-install

markdown-lint:
@$(LOG_TARGET)
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" --ignore node_modules --ignore website/node_modules --ignore dashboard/frontend/node_modules
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" \
--ignore node_modules \
--ignore website/node_modules \
--ignore dashboard/frontend/node_modules \
--ignore models

markdown-lint-fix:
@$(LOG_TARGET)
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" --ignore node_modules --ignore website/node_modules --ignore dashboard/frontend/node_modules --fix
markdownlint -c tools/linter/markdown/markdownlint.yaml "**/*.md" \
--ignore node_modules \
--ignore website/node_modules \
--ignore dashboard/frontend/node_modules \
--ignore models \
--fix

yaml-lint:
@$(LOG_TARGET)
Expand Down
8 changes: 8 additions & 0 deletions tools/make/models.mk
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,10 @@ download-models:

download-models-minimal:
@mkdir -p models
# Pre-download tiny LLM for llm-katan (optional but speeds up first start)
@if [ ! -f "models/Qwen/Qwen3-0.6B/.downloaded" ] || [ ! -d "models/Qwen/Qwen3-0.6B" ]; then \
hf download Qwen/Qwen3-0.6B --local-dir models/Qwen/Qwen3-0.6B && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/Qwen/Qwen3-0.6B/.downloaded; \
fi
@if [ ! -f "models/category_classifier_modernbert-base_model/.downloaded" ] || [ ! -d "models/category_classifier_modernbert-base_model" ]; then \
hf download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir models/category_classifier_modernbert-base_model && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/category_classifier_modernbert-base_model/.downloaded; \
fi
Expand All @@ -41,6 +45,10 @@ download-models-minimal:

download-models-full:
@mkdir -p models
# Pre-download tiny LLM for llm-katan (optional but speeds up first start)
@if [ ! -f "models/Qwen/Qwen3-0.6B/.downloaded" ] || [ ! -d "models/Qwen/Qwen3-0.6B" ]; then \
hf download Qwen/Qwen3-0.6B --local-dir models/Qwen/Qwen3-0.6B && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/Qwen/Qwen3-0.6B/.downloaded; \
fi
@if [ ! -f "models/category_classifier_modernbert-base_model/.downloaded" ] || [ ! -d "models/category_classifier_modernbert-base_model" ]; then \
hf download LLM-Semantic-Router/category_classifier_modernbert-base_model --local-dir models/category_classifier_modernbert-base_model && printf '%s\n' "$$(date -u +%Y-%m-%dT%H:%M:%SZ)" > models/category_classifier_modernbert-base_model/.downloaded; \
fi
Expand Down
Loading