Skip to content

Conversation

@yossiovadia
Copy link
Collaborator

This is the first, basic validation test based on the new llm-katan infrastructure.

  • Remove Ollama dependencies from E2E config as requested
  • Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0)
  • Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays), can now run on Macbook
  • remove hardcoded version of llm-katan.

Testing Status:

  • ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080)
  • ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping
  • ✅ Envoy: Running on port 8801
  • ✅ Test: 00-client-request-test.py passes with 200 OK responses
  • ✅ Malformed Request Test - Validates that malformed requests are properly rejected with appropriate error codes
  • ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan)

Release Notes: No

- Remove Ollama dependencies from E2E config as requested
- Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays)
- Add required use_reasoning fields to all model entries for validation
- Fix zero scores in model configurations (0.0 → 0.1)

Testing Status:
- ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080)
- ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping
- ✅ Envoy: Running on port 8801
- ✅ Test: 00-client-request-test.py passes with 200 OK responses
- ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
@netlify
Copy link

netlify bot commented Sep 29, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 4c9a66a
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68dac3bf1a98e300085d9b84
😎 Deploy Preview https://deploy-preview-290--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Sep 29, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 config

Owners: @rootfs
Files changed:

  • config/config.e2e.yaml

📁 e2e-tests

Owners: @yossiovadia
Files changed:

  • e2e-tests/00-client-request-test.py
  • e2e-tests/README.md
  • e2e-tests/llm-katan/llm_katan/__init__.py
  • e2e-tests/llm-katan/llm_katan/cli.py
  • e2e-tests/llm-katan/llm_katan/server.py
  • e2e-tests/start-llm-katan.sh

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Sep 29, 2025

@yossiovadia can you run pre-commit to fix the lint error?

["8001"]="Qwen/Qwen3-0.6B::TinyLlama/TinyLlama-1.1B-Chat-v1.0"
# Format: "port:real_model::served_model_name"
LLM_KATAN_MODELS=(
"8000:Qwen/Qwen3-0.6B::Qwen/Qwen2-0.5B-Instruct"
Copy link
Collaborator

@rootfs rootfs Sep 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qwen2-0.5B is an odd name :D

Maybe we can just use Model-A, Model-B?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.. fixin`

yossiovadia and others added 3 commits September 29, 2025 10:20
Apply black and isort formatting to LLM Katan Python files
as required by pre-commit hooks.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Update LLM Katan configuration to use simplified model names
- Simplify 00-client-request-test.py to use Model-A as default
- Update documentation to reflect math → Model-B, creative → Model-A routing
- Improve test readability and maintainability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Fix markdown linting issues in CLAUDE.md files
- Apply black formatting to Python files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
@rootfs rootfs merged commit bbc88bb into vllm-project:main Sep 29, 2025
9 checks passed
@yossiovadia yossiovadia deleted the e2e-00-client-request-test branch September 29, 2025 18:05
Aias00 pushed a commit to Aias00/semantic-router that referenced this pull request Oct 4, 2025
…m-project#290)

* feat: enable E2E testing with LLM Katan and fix configuration

- Remove Ollama dependencies from E2E config as requested
- Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays)
- Add required use_reasoning fields to all model entries for validation
- Fix zero scores in model configurations (0.0 → 0.1)

Testing Status:
- ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080)
- ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping
- ✅ Envoy: Running on port 8801
- ✅ Test: 00-client-request-test.py passes with 200 OK responses
- ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit formatting fixes

Apply black and isort formatting to LLM Katan Python files
as required by pre-commit hooks.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* refactor: simplify model names to Model-A and Model-B for E2E testing

- Update LLM Katan configuration to use simplified model names
- Simplify 00-client-request-test.py to use Model-A as default
- Update documentation to reflect math → Model-B, creative → Model-A routing
- Improve test readability and maintainability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit formatting fixes

- Fix markdown linting issues in CLAUDE.md files
- Apply black formatting to Python files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Signed-off-by: liuhy <[email protected]>
Aias00 pushed a commit to Aias00/semantic-router that referenced this pull request Oct 4, 2025
…m-project#290)

* feat: enable E2E testing with LLM Katan and fix configuration

- Remove Ollama dependencies from E2E config as requested
- Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays)
- Add required use_reasoning fields to all model entries for validation
- Fix zero scores in model configurations (0.0 → 0.1)

Testing Status:
- ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080)
- ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping
- ✅ Envoy: Running on port 8801
- ✅ Test: 00-client-request-test.py passes with 200 OK responses
- ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit formatting fixes

Apply black and isort formatting to LLM Katan Python files
as required by pre-commit hooks.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* refactor: simplify model names to Model-A and Model-B for E2E testing

- Update LLM Katan configuration to use simplified model names
- Simplify 00-client-request-test.py to use Model-A as default
- Update documentation to reflect math → Model-B, creative → Model-A routing
- Improve test readability and maintainability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit formatting fixes

- Fix markdown linting issues in CLAUDE.md files
- Apply black formatting to Python files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Signed-off-by: liuhy <[email protected]>
yossiovadia added a commit to yossiovadia/semantic-router that referenced this pull request Oct 8, 2025
…m-project#290)

* feat: enable E2E testing with LLM Katan and fix configuration

- Remove Ollama dependencies from E2E config as requested
- Update config.e2e.yaml to use only LLM Katan models (Qwen/Qwen2-0.5B-Instruct, TinyLlama/TinyLlama-1.1B-Chat-v1.0)
- Fix bash 3.2 compatibility in start-llm-katan.sh (replace associative arrays)
- Add required use_reasoning fields to all model entries for validation
- Fix zero scores in model configurations (0.0 → 0.1)

Testing Status:
- ✅ Router: Successfully starts with E2E config (ExtProc on :50051, API on :8080)
- ✅ LLM Katan: Running on ports 8000/8001 with correct model mapping
- ✅ Envoy: Running on port 8801
- ✅ Test: 00-client-request-test.py passes with 200 OK responses
- ✅ Pipeline: Full end-to-end flow working (Client → Envoy → ExtProc → LLM Katan)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit formatting fixes

Apply black and isort formatting to LLM Katan Python files
as required by pre-commit hooks.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* refactor: simplify model names to Model-A and Model-B for E2E testing

- Update LLM Katan configuration to use simplified model names
- Simplify 00-client-request-test.py to use Model-A as default
- Update documentation to reflect math → Model-B, creative → Model-A routing
- Improve test readability and maintainability

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit formatting fixes

- Fix markdown linting issues in CLAUDE.md files
- Apply black formatting to Python files

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants