-
Notifications
You must be signed in to change notification settings - Fork 273
feat: enable E2E testing with LLM Katan - 00-client-request-test #290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: enable E2E testing with LLM Katan - 00-client-request-test #290
Conversation
- Remove Ollama dependencies from E2E config as requested - Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0) - Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays) - Add required use_reasoning fields to all model entries for validation - Fix zero scores in model configurations (0.0 → 0.1) Testing Status: - ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080) - ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping - ✅ Envoy: Running on port 8801 - ✅ Test: 00-client-request-test.py passes with 200 OK responses - ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
@yossiovadia can you run pre-commit to fix the lint error? |
e2e-tests/start-llm-katan.sh
Outdated
| ["8001"]="Qwen/Qwen3-0.6B::TinyLlama/TinyLlama-1.1B-Chat-v1.0" | ||
| # Format: "port:real_model::served_model_name" | ||
| LLM_KATAN_MODELS=( | ||
| "8000:Qwen/Qwen3-0.6B::Qwen/Qwen2-0.5B-Instruct" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Qwen2-0.5B is an odd name :D
Maybe we can just use Model-A, Model-B?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree.. fixin`
Apply black and isort formatting to LLM Katan Python files as required by pre-commit hooks. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Update LLM Katan configuration to use simplified model names - Simplify 00-client-request-test.py to use Model-A as default - Update documentation to reflect math → Model-B, creative → Model-A routing - Improve test readability and maintainability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
- Fix markdown linting issues in CLAUDE.md files - Apply black formatting to Python files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>
…m-project#290) * feat: enable E2E testing with LLM Katan and fix configuration - Remove Ollama dependencies from E2E config as requested - Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0) - Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays) - Add required use_reasoning fields to all model entries for validation - Fix zero scores in model configurations (0.0 → 0.1) Testing Status: - ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080) - ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping - ✅ Envoy: Running on port 8801 - ✅ Test: 00-client-request-test.py passes with 200 OK responses - ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit formatting fixes Apply black and isort formatting to LLM Katan Python files as required by pre-commit hooks. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * refactor: simplify model names to Model-A and Model-B for E2E testing - Update LLM Katan configuration to use simplified model names - Simplify 00-client-request-test.py to use Model-A as default - Update documentation to reflect math → Model-B, creative → Model-A routing - Improve test readability and maintainability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit formatting fixes - Fix markdown linting issues in CLAUDE.md files - Apply black formatting to Python files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: liuhy <[email protected]>
…m-project#290) * feat: enable E2E testing with LLM Katan and fix configuration - Remove Ollama dependencies from E2E config as requested - Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0) - Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays) - Add required use_reasoning fields to all model entries for validation - Fix zero scores in model configurations (0.0 → 0.1) Testing Status: - ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080) - ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping - ✅ Envoy: Running on port 8801 - ✅ Test: 00-client-request-test.py passes with 200 OK responses - ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit formatting fixes Apply black and isort formatting to LLM Katan Python files as required by pre-commit hooks. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * refactor: simplify model names to Model-A and Model-B for E2E testing - Update LLM Katan configuration to use simplified model names - Simplify 00-client-request-test.py to use Model-A as default - Update documentation to reflect math → Model-B, creative → Model-A routing - Improve test readability and maintainability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit formatting fixes - Fix markdown linting issues in CLAUDE.md files - Apply black formatting to Python files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: liuhy <[email protected]>
…m-project#290) * feat: enable E2E testing with LLM Katan and fix configuration - Remove Ollama dependencies from E2E config as requested - Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0) - Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays) - Add required use_reasoning fields to all model entries for validation - Fix zero scores in model configurations (0.0 → 0.1) Testing Status: - ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080) - ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping - ✅ Envoy: Running on port 8801 - ✅ Test: 00-client-request-test.py passes with 200 OK responses - ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit formatting fixes Apply black and isort formatting to LLM Katan Python files as required by pre-commit hooks. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * refactor: simplify model names to Model-A and Model-B for E2E testing - Update LLM Katan configuration to use simplified model names - Simplify 00-client-request-test.py to use Model-A as default - Update documentation to reflect math → Model-B, creative → Model-A routing - Improve test readability and maintainability 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit formatting fixes - Fix markdown linting issues in CLAUDE.md files - Apply black formatting to Python files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>

This is the first, basic validation test based on the new llm-katan infrastructure.
Testing Status:
Release Notes: No