Revive e2e tests #173

yossiovadia · 2025-09-18T18:58:13Z

What type of PR is this?
test: improve e2e test suite reliability and system validation

What this PR does / why we need it:
Improves e2e test suite to properly validate core system components including semantic routing, jailbreak detection, PII policy enforcement, tool selection, and model selection. Tests now expose real
system issues instead of providing false positives.

Which issue(s) this PR fixes:
Fixes #

Release Notes:
No

Note that as backend, i was using ollama ( localhost:11434 ) and i was using OLLAMA_KEEP_ALIVE=0 to avoid models stay in memory.

netlify · 2025-09-18T18:58:18Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`2ab2523`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68d18431e616a90008a30e83
😎 Deploy Preview	https://deploy-preview-173--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

yossiovadia · 2025-09-18T19:00:33Z

Current status :

✅ PASSED - 00-client-request-test.py
✅ PASSED - 01-envoy-extproc-test.py
✅ PASSED - 02-router-classification-test.py
❌ FAILED - 03-jailbreak-test.py
✅ PASSED - 04-cache-test.py
✅ PASSED - 05-pii-policy-test.py
✅ PASSED - 06-tools-test.py
✅ PASSED - 07-model-selection-test.py
❌ FAILED - 08-metrics-test.py
❌ FAILED - 09-error-handling-test.py
✅ PASSED - test_base.py

❌ Some tests failed

I'll create issue per failure shortly.

github-actions · 2025-09-18T21:02:14Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `e2e-tests`

Owners: @yossiovadia
Files changed:

e2e-tests/05-pii-policy-test.py
e2e-tests/06-tools-test.py
e2e-tests/07-model-selection-test.py
e2e-tests/TEST_STATUS_REPORT.md
e2e-tests/00-client-request-test.py
e2e-tests/01-envoy-extproc-test.py
e2e-tests/02-router-classification-test.py
e2e-tests/04-cache-test.py

📁 `config`

Owners: @rootfs
Files changed:

config/config.yaml

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-09-18T23:44:19Z

e2e-tests/00-client-request-test.py

 ENVOY_URL = "http://localhost:8801"
 OPENAI_ENDPOINT = "/v1/chat/completions"
-DEFAULT_MODEL = "qwen2.5:32b"  # Changed to match other tests
+DEFAULT_MODEL = "gemma3:27b"  # Use configured model


@yossiovadia can you follow up with a PR to retrieve the models from v1/models endpoint?

rootfs

let's merge it for now and use this to test against the integration setup.

rootfs · 2025-09-18T23:46:58Z

@yossiovadia can you sign the DCO?

rootfs · 2025-09-19T19:07:53Z

@yossiovadia can you sign DCO?

In your local branch, run: git rebase HEAD~13 --signoff
Force push your changes to overwrite the branch: git push --force-with-lease origin revive-e2e-tests

- Add new e2e test files: jailbreak, pii-policy, tools, model-selection, metrics, error-handling tests - Update existing e2e tests: client-request, envoy-extproc, router-classification, cache tests - Add CLAUDE.md with project documentation and instructions 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Increase timeouts from 10s to 30s in failing test files - Update config health check from /health to /api/version for Ollama compatibility - Fix metrics naming expectations in jailbreak, PII, and general metrics tests Co-Authored-By: Claude <[email protected]>

Updated timeout values in 5 test files to prevent timeout failures: - 00-client-request-test.py - 04-cache-test.py - 06-tools-test.py - 07-model-selection-test.py - 09-error-handling-test.py This should resolve the remaining timeout issues seen in local testing.

…ailbreak blocking tests - Remove permissive 503 acceptance from benign request tests - Add new test_jailbreak_attempts_blocked() to test actual security - Require 200 status for benign requests (proper service validation) - Require 4xx status for jailbreak attempts (proper security blocking) - This will expose real security vulnerabilities instead of hiding them These changes make tests fail when they should, revealing actual system issues. Signed-off-by: Yossi Ovadia <[email protected]>

…ection - Add test_auto_routing_intelligence() to verify semantic routing works - Use model='auto' to trigger intelligent routing (not fixed model) - Test that math problems route to phi4 (highest score: 1.0) - Test that creative writing routes to different model than phi4 - Validate that different query types get different models - This will expose if routing intelligence is actually working This test will fail if the router just returns the same model for all queries. Signed-off-by: Yossi Ovadia <[email protected]>

…comprehensive status report - Remove HTTP 503 acceptance from PII, tools, model selection, and error handling tests - Tests now require 200 status codes for successful operations - Service failures now properly fail tests instead of false positives - Add comprehensive TEST_STATUS_REPORT.md documenting all test improvements - Expose real system bugs: input validation gaps, jailbreak blocking issues - 6/11 test files now hardened and provide reliable system health assessment Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Simplified TEST_STATUS_REPORT.md to focus only on current system issues - Remove resolved/fixed sections that are no longer relevant - Replace (200, 200) ranges with simple 200 expected status in error handling - Remove redundant 'no 503 accepted' comments from all test files - Clean up unnecessary verbosity while maintaining test functionality Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

- Replace 'DAN role-play jailbreak' with clear 'Role-play jailbreak attempt' - Improve readability by removing technical jargon from failure descriptions Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

Signed-off-by: Yossi Ovadia <[email protected]>

Add DCO signoffs to 00, 01, 04, and 08 test files to complete DCO compliance for all files in the PR. Signed-off-by: Yossi Ovadia <[email protected]>

* metrics: Add request-level token histograms Signed-off-by: Jintao Zhang <[email protected]> * add unknown const Signed-off-by: Jintao Zhang <[email protected]> --------- Signed-off-by: Jintao Zhang <[email protected]>

Signed-off-by: cryo <[email protected]>

Signed-off-by: rongfu.leng <[email protected]> Co-authored-by: Huamin Chen <[email protected]>

Fix the copy command for tools directory in Dockerfile. Signed-off-by: yuluo-yx <[email protected]>

* feat: add basic cache eviction policy: LRU/LFU/FIFO Signed-off-by: Alex Wang <[email protected]> * use EvictionPolicyType Signed-off-by: Alex Wang <[email protected]> * update doc Signed-off-by: Alex Wang <[email protected]> --------- Signed-off-by: Alex Wang <[email protected]> Co-authored-by: Huamin Chen <[email protected]>

Signed-off-by: Huamin Chen <[email protected]>

Signed-off-by: bitliu <[email protected]>

* infra: update Dockerfile.extproc Signed-off-by: yuluo-yx <[email protected]> * feat: add precommit container, make it easier to run precommit Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> Co-authored-by: Huamin Chen <[email protected]>

Signed-off-by: yuluo-yx <[email protected]>

Signed-off-by: cryo <[email protected]>

Remove 03-jailbreak-test.py, 08-metrics-test.py, and 09-error-handling-test.py to be implemented in separate PRs with full backend functionality. This keeps the current PR focused on passing tests for clean merge. Signed-off-by: Yossi Ovadia <[email protected]>

yossiovadia · 2025-09-22T17:19:46Z

Closing this PR due to DCO complications from mixed commits. Creating fresh PR with clean e2e tests.

yossiovadia requested review from Xunzhuo and rootfs as code owners September 18, 2025 18:58

github-actions bot assigned rootfs and yossiovadia Sep 18, 2025

rootfs reviewed Sep 18, 2025

View reviewed changes

rootfs previously approved these changes Sep 18, 2025

View reviewed changes

yossiovadia dismissed rootfs’s stale review via 4c6715e September 18, 2025 23:54

yossiovadia force-pushed the revive-e2e-tests branch from b1395a8 to 4c6715e Compare September 18, 2025 23:54

yossiovadia requested a review from rootfs September 19, 2025 17:43

yossiovadia and others added 16 commits September 22, 2025 10:15

fix: remove DAN jargon from status report

ae18b6a

- Replace 'DAN role-play jailbreak' with clear 'Role-play jailbreak attempt' - Improve readability by removing technical jargon from failure descriptions Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]>

Delete CLAUDE.md

f406747

feat: add DCO signoffs to all test files and status report

d36d323

Signed-off-by: Yossi Ovadia <[email protected]>

feat: add DCO signoffs to remaining test files

08a8041

Add DCO signoffs to 00, 01, 04, and 08 test files to complete DCO compliance for all files in the PR. Signed-off-by: Yossi Ovadia <[email protected]>

metrics: Add request-level token histograms (vllm-project#157)

e3ac473

* metrics: Add request-level token histograms Signed-off-by: Jintao Zhang <[email protected]> * add unknown const Signed-off-by: Jintao Zhang <[email protected]> --------- Signed-off-by: Jintao Zhang <[email protected]>

docs: add repo URL in docker/README.md (vllm-project#163)

ea14306

Signed-off-by: cryo <[email protected]>

remove discarded fields from documents (vllm-project#165)

9a43f53

Signed-off-by: rongfu.leng <[email protected]> Co-authored-by: Huamin Chen <[email protected]>

Correct tools directory copy command in Dockerfile (vllm-project#171)

a329bb5

Fix the copy command for tools directory in Dockerfile. Signed-off-by: yuluo-yx <[email protected]>

rootfs and others added 7 commits September 22, 2025 10:15

chore: add just max token for different models in router bench

5a0a957

Signed-off-by: Huamin Chen <[email protected]>

docs: Model Performance Evaluation Guide (vllm-project#136)

1d37497

api: add semantic route support (vllm-project#147)

9815862

Signed-off-by: bitliu <[email protected]>

feat: add more content for contribution docs (vllm-project#175)

6569c5a

Signed-off-by: yuluo-yx <[email protected]>

fix: avoid double counting cache hits (vllm-project#177)

afd9cb8

Signed-off-by: cryo <[email protected]>

yossiovadia force-pushed the revive-e2e-tests branch from 5ff3e56 to 2ab2523 Compare September 22, 2025 17:15

yossiovadia requested review from wangchen615 and yuezhu1 as code owners September 22, 2025 17:15

yossiovadia closed this Sep 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Revive e2e tests #173

Revive e2e tests #173

Uh oh!

yossiovadia commented Sep 18, 2025 •

edited

Loading

Uh oh!

netlify bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

yossiovadia commented Sep 18, 2025

Uh oh!

github-actions bot commented Sep 18, 2025 •

edited

Loading

Uh oh!

rootfs Sep 18, 2025

Uh oh!

rootfs left a comment

Uh oh!

rootfs commented Sep 18, 2025

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

yossiovadia commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

Revive e2e tests #173

Revive e2e tests #173

Uh oh!

Conversation

yossiovadia commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

yossiovadia commented Sep 18, 2025

Uh oh!

github-actions bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 e2e-tests

📁 config

🎉 Thanks for your contributions!

Uh oh!

rootfs Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

rootfs left a comment

Choose a reason for hiding this comment

Uh oh!

rootfs commented Sep 18, 2025

Uh oh!

rootfs commented Sep 19, 2025

Uh oh!

yossiovadia commented Sep 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

yossiovadia commented Sep 18, 2025 •

edited

Loading

netlify bot commented Sep 18, 2025 •

edited

Loading

github-actions bot commented Sep 18, 2025 •

edited

Loading

📁 `e2e-tests`

📁 `config`