-
Notifications
You must be signed in to change notification settings - Fork 287
feat: add mock vLLM infrastructure for lightweight e2e testing #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add mock vLLM infrastructure for lightweight e2e testing #228
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
great! can you make it run it CI? |
|
making some changes for it to be using real ( tiny ) model and not mock.
https://pypi.org/project/llm-katan/ will submit probably tmrw after some
more testing
…On Thu, Sep 25, 2025 at 5:03 PM Xunzhuo ***@***.***> wrote:
*Xunzhuo* left a comment (vllm-project/semantic-router#228)
<#228 (comment)>
great! can you make it run it CI?
—
Reply to this email directly, view it on GitHub
<#228 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAOECUARKDAVXZITQQAGW5L3UR7GDAVCNFSM6AAAAACHQRLS3CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTGMZWGI4DONBRGQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
That's so cool! |
|
@yossiovadia can you remove the |
e2bb7d3 to
52a2b54
Compare
config/config.yaml
Outdated
| - "phi4" # Same model can be served by multiple endpoints for redundancy | ||
| - "mistral-small3.1" | ||
| weight: 2 # Higher weight for more powerful endpoint | ||
| - name: "qwen-endpoint" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you create an e2e config.yaml (just like the mock vllm config.testing.yaml)
config/config.yaml
Outdated
| models: | ||
| - "Qwen/Qwen2-0.5B-Instruct" | ||
| weight: 1 | ||
| health_check_path: "/health" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is removed in #201
e2e-tests/llm-katan/pyproject.toml
Outdated
| maintainers = [ | ||
| {name = "Yossi Ovadia", email = "[email protected]"} | ||
| ] | ||
| license = {text = "MIT"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use Apache license?
184ed74 to
7683d8c
Compare
This commit introduces a mock vLLM server infrastructure to enable e2e testing without requiring GPU resources. The mock infrastructure simulates intelligent routing behavior while maintaining compatibility with the existing semantic router. Key changes: - Add mock-vllm-server.py: Simulates vLLM OpenAI-compatible API with intelligent content-based routing (math queries → TinyLlama, general → Qwen) - Add start-mock-servers.sh: Launch mock servers in foreground mode - Update config.yaml: Add minimal vLLM endpoint configuration for Qwen (port 8000) and TinyLlama (port 8001) with smart routing preference - Update 00-client-request-test.py: Fix import path and use configured model - Update e2e-tests/README.md: Document mock infrastructure usage - Update build-run-test.mk: Add mock server management targets The mock infrastructure enables: - Fast e2e testing without GPU dependencies - Content-aware model selection simulation - vLLM API compatibility testing - Smart routing behavior validation Signed-off-by: Yossi Ovadia <[email protected]>
Replace the mock vLLM server with a real FastAPI-based implementation using HuggingFace transformers and tiny models. The new LLM Katan package provides actual inference while maintaining lightweight testing benefits. Key changes: - Add complete LLM Katan PyPI package (v0.1.4) under e2e-tests/ - FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models, /health, /metrics) - Real Qwen/Qwen3-0.6B model with name aliasing for multi-model testing - Enhanced logging and Prometheus metrics endpoint - CLI tool with comprehensive configuration options - Replace start-mock-servers.sh with start-llm-katan.sh - Update e2e-tests README with new LLM Katan usage instructions - Remove obsolete mock-vllm-server.py and start-mock-servers.sh Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
Add comprehensive setup section covering HuggingFace token requirements with three authentication methods: - Environment variable (HUGGINGFACE_HUB_TOKEN) - CLI login (huggingface-cli login) - Token file in home directory Explains why token is needed (private models, rate limits, reliable downloads) and provides direct link to HuggingFace token settings. Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Add dist/, build/, *.egg-info/, *.whl to ignore Python build outputs - Prevents accidentally committing generated files Signed-off-by: Yossi Ovadia <[email protected]>
- Create config.e2e.yaml with LLM Katan endpoints for e2e tests - Restore config.yaml to original production endpoints (matches origin/main) - Add run-router-e2e target to use e2e config (config/config.e2e.yaml) - Add start-llm-katan and test-e2e-vllm targets for LLM Katan testing - Update Makefile help with new e2e test targets - Remove egg-info directory from git tracking (now in .gitignore) - Keep pyproject.toml at stable version 0.1.4, always install latest via pip This separation allows: - Production config stays clean with real vLLM endpoints - E2E tests use lightweight LLM Katan servers - Clear distinction between test and production environments - Always use latest LLM Katan features via unpinned pip installation Signed-off-by: Yossi Ovadia <[email protected]>
- Change test model from 'gemma3:27b' to 'Qwen/Qwen2-0.5B-Instruct' - Ensures Envoy health check uses model available in e2e config - Fixes 503 errors when checking if Envoy proxy is running Signed-off-by: Yossi Ovadia <[email protected]>
- Bump version to 0.1.6 for PyPI publishing - Change license from MIT to Apache-2.0 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Update license classifier from MIT to Apache Software License - Bump version to 0.1.7 for corrected license display on PyPI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
7683d8c to
45bae5f
Compare
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
- Fix markdown linting issues (MD032, MD031, MD047) in README files - Remove binary distribution files from git tracking - Add Python build artifacts to .gitignore - Auto-format Python files with black and isort - Add CLAUDE.md exclusion to prevent future commits 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
bd65f99 to
745d6d5
Compare
Update repository URLs in pyproject.toml to point to the correct vllm-project organization instead of personal fork. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
Revert production config.yaml to original state from main branch. The config modifications were not intended for this PR and should remain unchanged to preserve production configuration. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
Copy config.yaml from upstream main to ensure it matches exactly and includes the health_check_path and other missing fields. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
|
@yossiovadia this is great! can you follow up with a PR to add the instructions to the doc too (along https://github.com/vllm-project/semantic-router/blob/main/website/docs/installation/deploy-quickstart.md#choosing-a-path)? |
…project#228) * feat: add mock vLLM infrastructure for lightweight e2e testing This commit introduces a mock vLLM server infrastructure to enable e2e testing without requiring GPU resources. The mock infrastructure simulates intelligent routing behavior while maintaining compatibility with the existing semantic router. Key changes: - Add mock-vllm-server.py: Simulates vLLM OpenAI-compatible API with intelligent content-based routing (math queries → TinyLlama, general → Qwen) - Add start-mock-servers.sh: Launch mock servers in foreground mode - Update config.yaml: Add minimal vLLM endpoint configuration for Qwen (port 8000) and TinyLlama (port 8001) with smart routing preference - Update 00-client-request-test.py: Fix import path and use configured model - Update e2e-tests/README.md: Document mock infrastructure usage - Update build-run-test.mk: Add mock server management targets The mock infrastructure enables: - Fast e2e testing without GPU dependencies - Content-aware model selection simulation - vLLM API compatibility testing - Smart routing behavior validation Signed-off-by: Yossi Ovadia <[email protected]> * feat: replace mock vLLM infrastructure with LLM Katan package Replace the mock vLLM server with a real FastAPI-based implementation using HuggingFace transformers and tiny models. The new LLM Katan package provides actual inference while maintaining lightweight testing benefits. Key changes: - Add complete LLM Katan PyPI package (v0.1.4) under e2e-tests/ - FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models, /health, /metrics) - Real Qwen/Qwen3-0.6B model with name aliasing for multi-model testing - Enhanced logging and Prometheus metrics endpoint - CLI tool with comprehensive configuration options - Replace start-mock-servers.sh with start-llm-katan.sh - Update e2e-tests README with new LLM Katan usage instructions - Remove obsolete mock-vllm-server.py and start-mock-servers.sh Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * docs: add HuggingFace token setup instructions to LLM Katan README Add comprehensive setup section covering HuggingFace token requirements with three authentication methods: - Environment variable (HUGGINGFACE_HUB_TOKEN) - CLI login (huggingface-cli login) - Token file in home directory Explains why token is needed (private models, rate limits, reliable downloads) and provides direct link to HuggingFace token settings. Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: add Python build artifacts to .gitignore - Add dist/, build/, *.egg-info/, *.whl to ignore Python build outputs - Prevents accidentally committing generated files Signed-off-by: Yossi Ovadia <[email protected]> * refactor: separate e2e and production configs - Create config.e2e.yaml with LLM Katan endpoints for e2e tests - Restore config.yaml to original production endpoints (matches origin/main) - Add run-router-e2e target to use e2e config (config/config.e2e.yaml) - Add start-llm-katan and test-e2e-vllm targets for LLM Katan testing - Update Makefile help with new e2e test targets - Remove egg-info directory from git tracking (now in .gitignore) - Keep pyproject.toml at stable version 0.1.4, always install latest via pip This separation allows: - Production config stays clean with real vLLM endpoints - E2E tests use lightweight LLM Katan servers - Clear distinction between test and production environments - Always use latest LLM Katan features via unpinned pip installation Signed-off-by: Yossi Ovadia <[email protected]> * fix: update e2e test to use model from config.e2e.yaml - Change test model from 'gemma3:27b' to 'Qwen/Qwen2-0.5B-Instruct' - Ensures Envoy health check uses model available in e2e config - Fixes 503 errors when checking if Envoy proxy is running Signed-off-by: Yossi Ovadia <[email protected]> * Update llm-katan package metadata - Bump version to 0.1.6 for PyPI publishing - Change license from MIT to Apache-2.0 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * Fix Apache license classifier in pyproject.toml - Update license classifier from MIT to Apache Software License - Bump version to 0.1.7 for corrected license display on PyPI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: resolve pre-commit hook failures - Fix markdown linting issues (MD032, MD031, MD047) in README files - Remove binary distribution files from git tracking - Add Python build artifacts to .gitignore - Auto-format Python files with black and isort - Add CLAUDE.md exclusion to prevent future commits 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: update llm-katan project URLs to vllm-project repository Update repository URLs in pyproject.toml to point to the correct vllm-project organization instead of personal fork. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: revert config.yaml to original main branch version Revert production config.yaml to original state from main branch. The config modifications were not intended for this PR and should remain unchanged to preserve production configuration. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: restore config.yaml to match upstream main exactly Copy config.yaml from upstream main to ensure it matches exactly and includes the health_check_path and other missing fields. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: liuhy <[email protected]>
…project#228) * feat: add mock vLLM infrastructure for lightweight e2e testing This commit introduces a mock vLLM server infrastructure to enable e2e testing without requiring GPU resources. The mock infrastructure simulates intelligent routing behavior while maintaining compatibility with the existing semantic router. Key changes: - Add mock-vllm-server.py: Simulates vLLM OpenAI-compatible API with intelligent content-based routing (math queries → TinyLlama, general → Qwen) - Add start-mock-servers.sh: Launch mock servers in foreground mode - Update config.yaml: Add minimal vLLM endpoint configuration for Qwen (port 8000) and TinyLlama (port 8001) with smart routing preference - Update 00-client-request-test.py: Fix import path and use configured model - Update e2e-tests/README.md: Document mock infrastructure usage - Update build-run-test.mk: Add mock server management targets The mock infrastructure enables: - Fast e2e testing without GPU dependencies - Content-aware model selection simulation - vLLM API compatibility testing - Smart routing behavior validation Signed-off-by: Yossi Ovadia <[email protected]> * feat: replace mock vLLM infrastructure with LLM Katan package Replace the mock vLLM server with a real FastAPI-based implementation using HuggingFace transformers and tiny models. The new LLM Katan package provides actual inference while maintaining lightweight testing benefits. Key changes: - Add complete LLM Katan PyPI package (v0.1.4) under e2e-tests/ - FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models, /health, /metrics) - Real Qwen/Qwen3-0.6B model with name aliasing for multi-model testing - Enhanced logging and Prometheus metrics endpoint - CLI tool with comprehensive configuration options - Replace start-mock-servers.sh with start-llm-katan.sh - Update e2e-tests README with new LLM Katan usage instructions - Remove obsolete mock-vllm-server.py and start-mock-servers.sh Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * docs: add HuggingFace token setup instructions to LLM Katan README Add comprehensive setup section covering HuggingFace token requirements with three authentication methods: - Environment variable (HUGGINGFACE_HUB_TOKEN) - CLI login (huggingface-cli login) - Token file in home directory Explains why token is needed (private models, rate limits, reliable downloads) and provides direct link to HuggingFace token settings. Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: add Python build artifacts to .gitignore - Add dist/, build/, *.egg-info/, *.whl to ignore Python build outputs - Prevents accidentally committing generated files Signed-off-by: Yossi Ovadia <[email protected]> * refactor: separate e2e and production configs - Create config.e2e.yaml with LLM Katan endpoints for e2e tests - Restore config.yaml to original production endpoints (matches origin/main) - Add run-router-e2e target to use e2e config (config/config.e2e.yaml) - Add start-llm-katan and test-e2e-vllm targets for LLM Katan testing - Update Makefile help with new e2e test targets - Remove egg-info directory from git tracking (now in .gitignore) - Keep pyproject.toml at stable version 0.1.4, always install latest via pip This separation allows: - Production config stays clean with real vLLM endpoints - E2E tests use lightweight LLM Katan servers - Clear distinction between test and production environments - Always use latest LLM Katan features via unpinned pip installation Signed-off-by: Yossi Ovadia <[email protected]> * fix: update e2e test to use model from config.e2e.yaml - Change test model from 'gemma3:27b' to 'Qwen/Qwen2-0.5B-Instruct' - Ensures Envoy health check uses model available in e2e config - Fixes 503 errors when checking if Envoy proxy is running Signed-off-by: Yossi Ovadia <[email protected]> * Update llm-katan package metadata - Bump version to 0.1.6 for PyPI publishing - Change license from MIT to Apache-2.0 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * Fix Apache license classifier in pyproject.toml - Update license classifier from MIT to Apache Software License - Bump version to 0.1.7 for corrected license display on PyPI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: resolve pre-commit hook failures - Fix markdown linting issues (MD032, MD031, MD047) in README files - Remove binary distribution files from git tracking - Add Python build artifacts to .gitignore - Auto-format Python files with black and isort - Add CLAUDE.md exclusion to prevent future commits 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: update llm-katan project URLs to vllm-project repository Update repository URLs in pyproject.toml to point to the correct vllm-project organization instead of personal fork. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: revert config.yaml to original main branch version Revert production config.yaml to original state from main branch. The config modifications were not intended for this PR and should remain unchanged to preserve production configuration. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: restore config.yaml to match upstream main exactly Copy config.yaml from upstream main to ensure it matches exactly and includes the health_check_path and other missing fields. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>

This commit introduces a mock vLLM server infrastructure to enable e2e testing without requiring GPU resources. The mock infrastructure simulates intelligent routing behavior while maintaining compatibility with the existing semantic router.
Key changes:
The mock infrastructure enables:
Release Notes: No