Skip to content

Conversation

@yossiovadia
Copy link
Collaborator

@yossiovadia yossiovadia commented Sep 18, 2025

What type of PR is this?
test: improve e2e test suite reliability and system validation

What this PR does / why we need it:
Improves e2e test suite to properly validate core system components including semantic routing, jailbreak detection, PII policy enforcement, tool selection, and model selection. Tests now expose real
system issues instead of providing false positives.

Which issue(s) this PR fixes:
Fixes #

Release Notes:
No

Note that as backend, i was using ollama ( localhost:11434 ) and i was using OLLAMA_KEEP_ALIVE=0 to avoid models stay in memory.

@netlify
Copy link

netlify bot commented Sep 18, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 2ab2523
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68d18431e616a90008a30e83
😎 Deploy Preview https://deploy-preview-173--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@yossiovadia
Copy link
Collaborator Author

Current status :

✅ PASSED - 00-client-request-test.py
✅ PASSED - 01-envoy-extproc-test.py
✅ PASSED - 02-router-classification-test.py
❌ FAILED - 03-jailbreak-test.py
✅ PASSED - 04-cache-test.py
✅ PASSED - 05-pii-policy-test.py
✅ PASSED - 06-tools-test.py
✅ PASSED - 07-model-selection-test.py
❌ FAILED - 08-metrics-test.py
❌ FAILED - 09-error-handling-test.py
✅ PASSED - test_base.py

❌ Some tests failed

I'll create issue per failure shortly.

@github-actions
Copy link

github-actions bot commented Sep 18, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 e2e-tests

Owners: @yossiovadia
Files changed:

  • e2e-tests/05-pii-policy-test.py
  • e2e-tests/06-tools-test.py
  • e2e-tests/07-model-selection-test.py
  • e2e-tests/TEST_STATUS_REPORT.md
  • e2e-tests/00-client-request-test.py
  • e2e-tests/01-envoy-extproc-test.py
  • e2e-tests/02-router-classification-test.py
  • e2e-tests/04-cache-test.py

📁 config

Owners: @rootfs
Files changed:

  • config/config.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

ENVOY_URL = "http://localhost:8801"
OPENAI_ENDPOINT = "/v1/chat/completions"
DEFAULT_MODEL = "qwen2.5:32b" # Changed to match other tests
DEFAULT_MODEL = "gemma3:27b" # Use configured model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yossiovadia can you follow up with a PR to retrieve the models from v1/models endpoint?

rootfs
rootfs previously approved these changes Sep 18, 2025
Copy link
Collaborator

@rootfs rootfs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's merge it for now and use this to test against the integration setup.

@rootfs
Copy link
Collaborator

rootfs commented Sep 18, 2025

@yossiovadia can you sign the DCO?

@rootfs
Copy link
Collaborator

rootfs commented Sep 19, 2025

@yossiovadia can you sign DCO?

In your local branch, run: git rebase HEAD~13 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin revive-e2e-tests

yossiovadia and others added 16 commits September 22, 2025 10:15
- Add new e2e test files: jailbreak, pii-policy, tools, model-selection, metrics, error-handling tests
- Update existing e2e tests: client-request, envoy-extproc, router-classification, cache tests
- Add CLAUDE.md with project documentation and instructions

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Increase timeouts from 10s to 30s in failing test files
- Update config health check from /health to /api/version for Ollama compatibility
- Fix metrics naming expectations in jailbreak, PII, and general metrics tests

Co-Authored-By: Claude <[email protected]>
Updated timeout values in 5 test files to prevent timeout failures:
- 00-client-request-test.py
- 04-cache-test.py
- 06-tools-test.py
- 07-model-selection-test.py
- 09-error-handling-test.py

This should resolve the remaining timeout issues seen in local testing.
…ailbreak blocking tests

- Remove permissive 503 acceptance from benign request tests
- Add new test_jailbreak_attempts_blocked() to test actual security
- Require 200 status for benign requests (proper service validation)
- Require 4xx status for jailbreak attempts (proper security blocking)
- This will expose real security vulnerabilities instead of hiding them

These changes make tests fail when they should, revealing actual system issues.

Signed-off-by: Yossi Ovadia <[email protected]>
…ection

- Add test_auto_routing_intelligence() to verify semantic routing works
- Use model='auto' to trigger intelligent routing (not fixed model)
- Test that math problems route to phi4 (highest score: 1.0)
- Test that creative writing routes to different model than phi4
- Validate that different query types get different models
- This will expose if routing intelligence is actually working

This test will fail if the router just returns the same model for all queries.

Signed-off-by: Yossi Ovadia <[email protected]>
…comprehensive status report

- Remove HTTP 503 acceptance from PII, tools, model selection, and error handling tests
- Tests now require 200 status codes for successful operations
- Service failures now properly fail tests instead of false positives
- Add comprehensive TEST_STATUS_REPORT.md documenting all test improvements
- Expose real system bugs: input validation gaps, jailbreak blocking issues
- 6/11 test files now hardened and provide reliable system health assessment

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Simplified TEST_STATUS_REPORT.md to focus only on current system issues
- Remove resolved/fixed sections that are no longer relevant
- Replace (200, 200) ranges with simple 200 expected status in error handling
- Remove redundant 'no 503 accepted' comments from all test files
- Clean up unnecessary verbosity while maintaining test functionality

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
- Replace 'DAN role-play jailbreak' with clear 'Role-play jailbreak attempt'
- Improve readability by removing technical jargon from failure descriptions

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>
Add DCO signoffs to 00, 01, 04, and 08 test files to complete
DCO compliance for all files in the PR.

Signed-off-by: Yossi Ovadia <[email protected]>
* metrics: Add request-level token histograms

Signed-off-by: Jintao Zhang <[email protected]>

* add unknown const

Signed-off-by: Jintao Zhang <[email protected]>

---------

Signed-off-by: Jintao Zhang <[email protected]>
Fix the copy command for tools directory in Dockerfile.

Signed-off-by: yuluo-yx <[email protected]>
* feat: add basic cache eviction policy: LRU/LFU/FIFO

Signed-off-by: Alex Wang <[email protected]>

* use EvictionPolicyType

Signed-off-by: Alex Wang <[email protected]>

* update doc

Signed-off-by: Alex Wang <[email protected]>

---------

Signed-off-by: Alex Wang <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
rootfs and others added 7 commits September 22, 2025 10:15
* infra: update Dockerfile.extproc

Signed-off-by: yuluo-yx <[email protected]>

* feat: add precommit container, make it easier to run precommit

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
Remove 03-jailbreak-test.py, 08-metrics-test.py, and 09-error-handling-test.py
to be implemented in separate PRs with full backend functionality.

This keeps the current PR focused on passing tests for clean merge.

Signed-off-by: Yossi Ovadia <[email protected]>
@yossiovadia
Copy link
Collaborator Author

Closing this PR due to DCO complications from mixed commits. Creating fresh PR with clean e2e tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants