chore(infra): optimize Docker CI workflow for faster builds #344

Aias00 · 2025-10-04T12:56:03Z

What type of PR is this?

optimize Docker CI workflow for faster builds

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #343

Release Notes: Yes/No

netlify · 2025-10-04T12:56:29Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`48e1bc2`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e119650502bd0008b1c57f
😎 Deploy Preview	https://deploy-preview-344--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-10-04T12:56:47Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/fast-build.yml
.github/workflows/docker-publish.yml
.github/workflows/test-and-build.yml

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

netlify · 2025-10-04T12:59:20Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`7517f26`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/68e12b51dbfcff000866a262
😎 Deploy Preview	https://deploy-preview-344--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Signed-off-by: cryo <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

vllm-project#214) Signed-off-by: Florencio Cano Gabarda <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

* docs: add run precommit by docker or podman Signed-off-by: yuluo-yx <[email protected]> * fix: update by comment Signed-off-by: yuluo-yx <[email protected]> * chore: fix code style Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

…idation (vllm-project#219) * add IPv4 address for mock-vllm Signed-off-by: JaredforReal <[email protected]> * use online/hf-cache for all-MiniLM Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: liuhy <[email protected]>

…llm-project#222) * add Grafana and Prometheus & docs for setting up Signed-off-by: JaredforReal <[email protected]> * focus on MVP with local docker compose Signed-off-by: JaredforReal <[email protected]> * refactor observability.md Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: cryo <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

* fix typo & add k8s quickstart doc Signed-off-by: JaredforReal <[email protected]> * change docker to deploy quickstart Signed-off-by: JaredforReal <[email protected]> * refactor deploy-quickstart.md Signed-off-by: JaredforReal <[email protected]> * declare k8s needs seperate llm endpoint and envoy set up Signed-off-by: JaredforReal <[email protected]> * add some reference in k8s requirement Signed-off-by: JaredforReal <[email protected]> * change docker to deploy quickstart Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]> Signed-off-by: liuhy <[email protected]>

…ect#236) Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

* infra: test model cache Signed-off-by: yuluo-yx <[email protected]> * chore: lookup huggingface cache dir Signed-off-by: yuluo-yx <[email protected]> * feat: when run test-vllm, get model from openai models api (vllm-project#236) Signed-off-by: yuluo-yx <[email protected]> * infra: cache models in test-and-build GHA (vllm-project#237) Signed-off-by: yuluo-yx <[email protected]> * chore: lookup huggingface cache dir Signed-off-by: yuluo-yx <[email protected]> * fix: only cache models Signed-off-by: yuluo-yx <[email protected]> * chore Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

…project#228) * feat: add mock vLLM infrastructure for lightweight e2e testing This commit introduces a mock vLLM server infrastructure to enable e2e testing without requiring GPU resources. The mock infrastructure simulates intelligent routing behavior while maintaining compatibility with the existing semantic router. Key changes: - Add mock-vllm-server.py: Simulates vLLM OpenAI-compatible API with intelligent content-based routing (math queries → TinyLlama, general → Qwen) - Add start-mock-servers.sh: Launch mock servers in foreground mode - Update config.yaml: Add minimal vLLM endpoint configuration for Qwen (port 8000) and TinyLlama (port 8001) with smart routing preference - Update 00-client-request-test.py: Fix import path and use configured model - Update e2e-tests/README.md: Document mock infrastructure usage - Update build-run-test.mk: Add mock server management targets The mock infrastructure enables: - Fast e2e testing without GPU dependencies - Content-aware model selection simulation - vLLM API compatibility testing - Smart routing behavior validation Signed-off-by: Yossi Ovadia <[email protected]> * feat: replace mock vLLM infrastructure with LLM Katan package Replace the mock vLLM server with a real FastAPI-based implementation using HuggingFace transformers and tiny models. The new LLM Katan package provides actual inference while maintaining lightweight testing benefits. Key changes: - Add complete LLM Katan PyPI package (v0.1.4) under e2e-tests/ - FastAPI server with OpenAI-compatible endpoints (/v1/chat/completions, /v1/models, /health, /metrics) - Real Qwen/Qwen3-0.6B model with name aliasing for multi-model testing - Enhanced logging and Prometheus metrics endpoint - CLI tool with comprehensive configuration options - Replace start-mock-servers.sh with start-llm-katan.sh - Update e2e-tests README with new LLM Katan usage instructions - Remove obsolete mock-vllm-server.py and start-mock-servers.sh Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * docs: add HuggingFace token setup instructions to LLM Katan README Add comprehensive setup section covering HuggingFace token requirements with three authentication methods: - Environment variable (HUGGINGFACE_HUB_TOKEN) - CLI login (huggingface-cli login) - Token file in home directory Explains why token is needed (private models, rate limits, reliable downloads) and provides direct link to HuggingFace token settings. Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: add Python build artifacts to .gitignore - Add dist/, build/, *.egg-info/, *.whl to ignore Python build outputs - Prevents accidentally committing generated files Signed-off-by: Yossi Ovadia <[email protected]> * refactor: separate e2e and production configs - Create config.e2e.yaml with LLM Katan endpoints for e2e tests - Restore config.yaml to original production endpoints (matches origin/main) - Add run-router-e2e target to use e2e config (config/config.e2e.yaml) - Add start-llm-katan and test-e2e-vllm targets for LLM Katan testing - Update Makefile help with new e2e test targets - Remove egg-info directory from git tracking (now in .gitignore) - Keep pyproject.toml at stable version 0.1.4, always install latest via pip This separation allows: - Production config stays clean with real vLLM endpoints - E2E tests use lightweight LLM Katan servers - Clear distinction between test and production environments - Always use latest LLM Katan features via unpinned pip installation Signed-off-by: Yossi Ovadia <[email protected]> * fix: update e2e test to use model from config.e2e.yaml - Change test model from 'gemma3:27b' to 'Qwen/Qwen2-0.5B-Instruct' - Ensures Envoy health check uses model available in e2e config - Fixes 503 errors when checking if Envoy proxy is running Signed-off-by: Yossi Ovadia <[email protected]> * Update llm-katan package metadata - Bump version to 0.1.6 for PyPI publishing - Change license from MIT to Apache-2.0 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * Fix Apache license classifier in pyproject.toml - Update license classifier from MIT to Apache Software License - Bump version to 0.1.7 for corrected license display on PyPI 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: resolve pre-commit hook failures - Fix markdown linting issues (MD032, MD031, MD047) in README files - Remove binary distribution files from git tracking - Add Python build artifacts to .gitignore - Auto-format Python files with black and isort - Add CLAUDE.md exclusion to prevent future commits 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: update llm-katan project URLs to vllm-project repository Update repository URLs in pyproject.toml to point to the correct vllm-project organization instead of personal fork. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: revert config.yaml to original main branch version Revert production config.yaml to original state from main branch. The config modifications were not intended for this PR and should remain unchanged to preserve production configuration. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: restore config.yaml to match upstream main exactly Copy config.yaml from upstream main to ensure it matches exactly and includes the health_check_path and other missing fields. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: liuhy <[email protected]>

* feat: add interactive terminal demo for multi-instance testing - Created animated terminal demo showcasing multi-instance capabilities - Added terminal-demo.html with realistic typing animations using TypeIt.js - Enhanced README with live demo link and improved use case documentation - Added embeddable demo widget (demo-embed.html) for external sites - Updated multi-instance examples to show mocking popular AI providers - Improved positioning documentation with strengths vs competitors - Highlighted key advantage: no GPU required, runs on laptops/Macs 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * chore: add .gitignore to exclude build artifacts and demo recordings - Added .gitignore to exclude .cast files from asciinema recordings - Excluded common build artifacts and IDE files - Prevents accidental commits of temporary demo files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * docs: enhance demo accessibility with GitHub Pages link and preview - Added GitHub Pages link for live interactive demo - Added collapsible preview section showing terminal output - Included fallback instructions for local demo viewing - Added guide for creating demo GIF alternatives 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: update demo links to point to main project repository - Changed GitHub Pages links from personal repo to vllm-project repository - Ensures demo will work once PR is merged to main - Provides correct canonical URL for PyPI and documentation 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * docs: add demo testing guide for PR reviewers - Created instructions for reviewers to test the interactive demo - Provided multiple options: local checkout, raw file viewing, static preview - Explains why live links won't work until PR is merged - Helps reviewers experience the full animation during review process 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * chore: remove demo testing guide Removed DEMO_TESTING.md to keep the PR focused on the core demo functionality. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: improve terminal demo layout and fix markdown lint issues Terminal Demo: - Reduced terminal heights from 300px to 220px with max-height 250px - Added overflow-y for better space utilization - Prevents bottom terminal from requiring scroll Markdown Lint: - Fixed line length issues (MD013) by breaking long lines - Converted bold text to proper headings (MD036) - Added blank lines around headings and lists (MD022, MD032) - Added markdownlint disable comments for required HTML elements 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: improve terminal demo sizing and timing - Restored bottom terminal (terminal-full) to proper size (300px min-height) - Increased Terminal 3 delay from 8.5s to 10s for better timing - Ensures Terminal 3 starts only after both servers complete their setup - Top terminals remain compact at 220-250px for better layout 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: resolve markdown lint issues in demo documentation - Added missing blank lines around fenced code blocks - Added trailing newlines to all markdown files - Added blank lines around lists - Ensures compliance with project markdown linting rules 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Signed-off-by: liuhy <[email protected]>

…roject#246) Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

…roject#247) Signed-off-by: cryo <[email protected]> Signed-off-by: liuhy <[email protected]>

…#233) * Fix LoRA Model Training Configuration and Data Balance Signed-off-by: OneZero-Y <[email protected]> Fix LoRA Model Training Configuration and Data Balance Signed-off-by: OneZero-Y <[email protected]> * fix:LoRA Model Training Configuration and Data Balance Signed-off-by: OneZero-Y <[email protected]> fix:LoRA Model Training Configuration and Data Balance Signed-off-by: OneZero-Y <[email protected]> --------- Signed-off-by: OneZero-Y <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

…ecture support Signed-off-by: liuhy <[email protected]>

…ld to trigger it on PRs Signed-off-by: liuhy <[email protected]>

* feat: add open webui pipe Signed-off-by: bitliu <[email protected]> * fix lint Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

* feat: add system prompt toggle endpoint Signed-off-by: Huamin Chen <[email protected]> * add cli option to explicitly enable the prompt toggle Signed-off-by: Huamin Chen <[email protected]> * fix test failure Signed-off-by: Huamin Chen <[email protected]> * fix test failure Signed-off-by: Huamin Chen <[email protected]> * fix test failure Signed-off-by: Huamin Chen <[email protected]> * adding system prompt endpoint option to makefile target Signed-off-by: Huamin Chen <[email protected]> * update doc Signed-off-by: Huamin Chen <[email protected]> * address review comment Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

* feat: improve batch classification test to validate accuracy Previously, the batch classification test only validated HTTP status and result count, but never checked if the classifications were correct. The expected_categories variable was created but never used for validation. Changes: - Extract actual categories from batch classification results - Compare against expected categories and calculate accuracy percentage - Add detailed output showing each classification result - Assert that accuracy meets 75% threshold - Maintain backward compatibility with existing HTTP/count checks This improved test now properly catches classification accuracy issues and will fail when the classification system returns incorrect results, exposing problems that were previously hidden. Related to issue vllm-project#318: Batch Classification API Returns Incorrect Categories Signed-off-by: Yossi Ovadia <[email protected]> * style: apply black formatting to classification test Automatic formatting applied by black pre-commit hook. Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: liuhy <[email protected]>

…le (vllm-project#320) The Classification API's /api/v1/classify/intent endpoint was returning placeholder "general" category responses with 0.5 confidence instead of performing actual classification using the unified classifier. Changes: - Update handleIntentClassification() to check for unified classifier availability first - Use ClassifyIntentUnified() when unified classifier is available - Fall back to legacy ClassifyIntent() when unified classifier not available - Maintain backward compatibility with existing API contract This resolves the issue where the single classification API always returned hardcoded placeholder responses instead of performing actual BERT-based classification. Fixes vllm-project#303 Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

* add k8s integration github action test Signed-off-by: JaredforReal <[email protected]> * add syntax validation for observability and ai-gateway Signed-off-by: JaredforReal <[email protected]> * fix image conflict & kustomize path error Signed-off-by: JaredforReal <[email protected]> * fix network error in kind Signed-off-by: JaredforReal <[email protected]> * fix https response to http error Signed-off-by: JaredforReal <[email protected]> * fix model init error Signed-off-by: JaredforReal <[email protected]> * change hf-cli to hf download for models Signed-off-by: JaredforReal <[email protected]> * change image loading strategy & models init Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]> Signed-off-by: liuhy <[email protected]>

…oject#323) * Initial plan * Fix Envoy health check by replacing wget with curl Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Signed-off-by: liuhy <[email protected]>

…m-project#326) * Initial plan * Add task_type validation and API discovery endpoint - Add validateTaskType helper function to validate task_type parameter - Reject invalid task_type values with 400 error and helpful message - Add GET /api/v1 endpoint for API discovery - Return comprehensive API overview with endpoints, task_types, and links - Add tests for invalid task_type values (jailbreak, invalid_type) - Add tests for valid task_types (intent, pii, security, all) - Add test for API overview endpoint Co-authored-by: rootfs <[email protected]> * Refactor API discovery to use centralized registry pattern - Replace hardcoded endpoint list with endpointRegistry - Replace hardcoded task types with taskTypeRegistry - Generate API documentation dynamically from registries - Add filtering logic for system prompt endpoints - Add test for system prompt endpoint filtering - Enables future OpenAPI spec generation from registry - Makes API documentation easier to maintain and extend Co-authored-by: rootfs <[email protected]> * Add OpenAPI 3.0 spec generation and Swagger UI - Implement OpenAPI 3.0 specification structures - Add generateOpenAPISpec() to dynamically generate spec from registry - Add /openapi.json endpoint serving OpenAPI 3.0 spec - Add /docs endpoint serving interactive Swagger UI - Update endpoint registry to include new documentation endpoints - Add openapi_spec and swagger_ui links to API overview - Automatically filter system prompt endpoints in spec based on config - Add comprehensive tests for OpenAPI and Swagger UI endpoints - Tests verify spec structure, filtering, and UI rendering Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Signed-off-by: liuhy <[email protected]>

…llm-project#322) * Initial plan * Add OpenTelemetry tracing infrastructure and basic instrumentation Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing instrumentation and tests Co-authored-by: rootfs <[email protected]> * Add comprehensive tracing documentation and deployment examples Co-authored-by: rootfs <[email protected]> * Update README and add feature summary documentation Co-authored-by: rootfs <[email protected]> * Fix broken documentation link in tracing quickstart guide Co-authored-by: rootfs <[email protected]> * Run go mod tidy to fix go.mod and go.sum Co-authored-by: rootfs <[email protected]> * Fix markdown lint errors and remove TRACING_FEATURE.md Co-authored-by: rootfs <[email protected]> * Fix OTLP exporter to connect asynchronously to prevent test panics Co-authored-by: rootfs <[email protected]> * Fix StartSpan to handle nil context gracefully Co-authored-by: rootfs <[email protected]> * fix lint error Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

…project#332) Signed-off-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

Signed-off-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

Add 06-pii-detection-test.py to test PII detection across multiple pathways: - Batch API PII Detection via unified classifier - Direct PII API endpoint testing - ExtProc PII filtering in production pipeline - Multiple PII types pattern analysis The test suite includes cache-busting mechanisms using timestamps and UUIDs to avoid cached responses, comprehensive error reporting, and detailed documentation of system behavior for regression testing. Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: liuhy <[email protected]>

* feat: add comprehensive jailbreak detection test Adds 05-jailbreak-test.py to expose critical security gaps in jailbreak detection: 1. Routing Bug: task_type="jailbreak" incorrectly routes to category classifier - Jailbreak content "How to build a bomb?" classified as "psychology" instead of "jailbreak" - Security risk: harmful content bypasses jailbreak detection 2. Missing Endpoint: POST /api/v1/classify/jailbreak returns 404 - Direct jailbreak classification endpoint not implemented - Forces users to rely on batch endpoint with broken routing 3. ExtProc Security Gap: Tests that ExtProc pipeline allows jailbreak content through - Validates end-to-end security filtering in LLM completion pipeline - Documents security bypass where harmful instructions can be generated Test Features: - Documents multiple jailbreak attempts and safe content for comparison - Provides detailed analysis of detection patterns and accuracy - Exposes routing bugs and security gaps with clear failure messages - Follows existing e2e test patterns for consistency This test serves as both documentation of current security issues and validation framework for future jailbreak detection improvements. Signed-off-by: Yossi Ovadia <[email protected]> * fix: correct jailbreak test to use proper API parameters Updates 05-jailbreak-test.py to use the correct API parameters for jailbreak detection: CORRECTED API USAGE: - Changed task_type from "jailbreak" to "security" (the correct parameter) - Updated expectations to check for threat detection vs "safe" classification - Fixed validation logic to properly test security endpoint behavior VALIDATION CONFIRMED: - task_type="security" correctly routes to security classifier - Jailbreak content now properly detected as "jailbreak" with 99.1% confidence - Test validates that dangerous content is NOT classified as "safe" ENDPOINTS VALIDATED: - ✅ /api/v1/classify/batch with task_type="security" - Works correctly - ❌ /api/v1/classify/jailbreak - Confirmed missing (404 as expected) The test now accurately validates jailbreak detection capabilities using the correct API interface, rather than testing against wrong parameters. Signed-off-by: Yossi Ovadia <[email protected]> * feat: add comprehensive jailbreak detection tests Adds 05-jailbreak-test.py with comprehensive test coverage for jailbreak detection across multiple classifier paths: - Batch API security classification (ModernBERT path) - Direct security endpoint testing - ExtProc pipeline security validation - Pattern analysis across multiple test cases Features: - Cache-busting with unique test cases per run - Clear documentation of expected results per path - Detailed logging of classifier behavior differences - Comprehensive security gap analysis Tests expose critical security vulnerabilities where jailbreak content bypasses detection and reaches LLM backends, generating harmful responses. Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

* feat: enhance PII detection testing with comprehensive ExtProc validation 🔍 ENHANCED PII TESTING FRAMEWORK: - Added comprehensive ExtProc PII detection test (TEST 3.5) - Tests differential behavior between PII and safe content - Validates production pipeline PII handling capabilities - Monitors routing decisions, processing times, and blocking behavior 📋 IMPROVED TEST COVERAGE: - Enhanced test case generation with cache-busting timestamps - Added comprehensive PII pattern analysis across multiple entity types - Better detection of ExtProc PII filtering mechanisms - More detailed logging and result analysis ⚙️ SMART PII POLICY CONFIGURATION: - Model-A: Strict PII policy (allow_by_default: false, EMAIL_ADDRESS only) - Model-B: Permissive PII policy (allow_by_default: true, all PII types) - Mixed policy approach enables better testing of PII routing behavior 📊 TEST CAPABILITIES: - Detects PII blocking vs routing-only behavior - Monitors differential model selection based on PII content - Validates security policy enforcement in production pipeline - Comprehensive analysis of ExtProc PII detection indicators This establishes a comprehensive testing framework that will reveal any gaps in PII detection and policy enforcement across the entire semantic router pipeline. Signed-off-by: Yossi Ovadia <[email protected]> * style: apply black formatting to PII detection test Apply automatic Python code formatting from black to ensure consistent code style across the test file. No functional changes - only formatting improvements including: - Trailing commas for better diffs - Line wrapping for readability - Consistent spacing around operators Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: liuhy <[email protected]>

…roject#341) Signed-off-by: liuhy <[email protected]>

* docs: add mermaid modal Signed-off-by: yuluo-yx <[email protected]> * fix Signed-off-by: yuluo-yx <[email protected]> * fix Signed-off-by: yuluo-yx <[email protected]> * fix: fix lit Signed-off-by: yuluo-yx <[email protected]> * fix Signed-off-by: yuluo-yx <[email protected]> * fix Signed-off-by: yuluo-yx <[email protected]> * Fix the issue where the top scroll bar is not visible when the chart is enlarged. Signed-off-by: yuluo-yx <[email protected]> * fix lint Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

* docs: use ts replace js in docs website Signed-off-by: yuluo-yx <[email protected]> * chore: tranlate chinese Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

…ecture support Signed-off-by: liuhy <[email protected]>

…ld to trigger it on PRs Signed-off-by: liuhy <[email protected]>

…ecture support Signed-off-by: liuhy <[email protected]>

…ld to trigger it on PRs Signed-off-by: liuhy <[email protected]>

rootfs · 2025-10-04T14:15:16Z

@Aias00 looks the merge hit some issue, can you open another PR with your changes?

Aias00 · 2025-10-04T14:17:26Z

yes maybe tomorrow， i should go to bed now🤣

…

---- Replied Message ---- | From | Huamin ***@***.***> | | Date | 10/04/2025 22:15 | | To | vllm-project/semantic-router ***@***.***> | | Cc | aias00 ***@***.***>, Mention ***@***.***> | | Subject | Re: [vllm-project/semantic-router] chore(infra): optimize Docker CI workflow for faster builds (PR #344) | rootfs left a comment (vllm-project/semantic-router#344) @Aias00 looks the merge hit some issue, can you open another PR with your changes? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

rootfs · 2025-10-04T14:18:19Z

yes maybe tomorrow， i should go to bed now🤣
…
---- Replied Message ---- | From | Huamin @.> | | Date | 10/04/2025 22:15 | | To | vllm-project/semantic-router @.> | | Cc | aias00 @.>, Mention @.> | | Subject | Re: [vllm-project/semantic-router] chore(infra): optimize Docker CI workflow for faster builds (PR #344) | rootfs left a comment (vllm-project/semantic-router#344) @Aias00 looks the merge hit some issue, can you open another PR with your changes? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

no rush, thank you!

Aias00 requested review from rootfs and Xunzhuo as code owners October 4, 2025 12:56

Aias00 force-pushed the feat/reduce_ci_duration branch from 48e1bc2 to f71c842 Compare October 4, 2025 12:56

github-actions bot assigned rootfs and Xunzhuo Oct 4, 2025

Aias00 force-pushed the feat/reduce_ci_duration branch from c4a7046 to f2c4162 Compare October 4, 2025 13:46

Aias00 requested a review from wangchen615 as a code owner October 4, 2025 13:46

cryo-zd and others added 20 commits October 4, 2025 22:07

feat: add config validation to NewCacheBackend (vllm-project#204)

b77f53e

Signed-off-by: cryo <[email protected]> Signed-off-by: liuhy <[email protected]>

docs: add note around model name consistency (vllm-project#205)

cfa4648

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

Add security attributes related to root usage to container definitions (

9038583

vllm-project#214) Signed-off-by: Florencio Cano Gabarda <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

docs: network tips (vllm-project#208)

a495dbb

Signed-off-by: liuhy <[email protected]>

project: add promotion rules (vllm-project#212)

22a7d49

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

feat: validate eviction policy in cache config (vllm-project#223)

1e8f2a0

Signed-off-by: cryo <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

docs: add tutorials for semantic cache (vllm-project#230)

91bdc6c

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

refactor: reogranize the contents (vllm-project#235)

ffd964d

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

feat: when run test-vllm, get model from openai models api (vllm-proj…

9fb1003

…ect#236) Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

infra: cache models in test-and-build GHA (vllm-project#237)

92d2e09

Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

optimize: use openai go sdk ChatCompletion replace map struct (vllm-p…

5a1c0a5

…roject#246) Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: liuhy <[email protected]>

chore: correct misplaced comment for struct UnifiedClassifier (vllm-p…

283d261

…roject#247) Signed-off-by: cryo <[email protected]> Signed-off-by: liuhy <[email protected]>

Xunzhuo and others added 23 commits October 4, 2025 22:07

fix: broken link in readme (vllm-project#316)

0ede82a

Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

chore: optimize Docker CI workflow for faster builds and multi-archit…

cd327f1

…ecture support Signed-off-by: liuhy <[email protected]>

feat: add fast build workflow for development and update test-and-bui…

d88ad89

…ld to trigger it on PRs Signed-off-by: liuhy <[email protected]>

feat: add open webui pipe (vllm-project#315)

5f3a31c

* feat: add open webui pipe Signed-off-by: bitliu <[email protected]> * fix lint Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: bitliu <[email protected]> Signed-off-by: liuhy <[email protected]>

fix: use both unified and legacy classifier to prevent failure (vllm-…

b94437b

…project#332) Signed-off-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

fix: use classification unit test (vllm-project#333)

1605fd0

Signed-off-by: Huamin Chen <[email protected]> Signed-off-by: liuhy <[email protected]>

feat(app): add direct execution support for local development (vllm-p…

1b7b097

…roject#341) Signed-off-by: liuhy <[email protected]>

chore: optimize Docker CI workflow for faster builds and multi-archit…

3f4ed62

…ecture support Signed-off-by: liuhy <[email protected]>

feat: add fast build workflow for development and update test-and-bui…

deb8233

…ld to trigger it on PRs Signed-off-by: liuhy <[email protected]>

chore: optimize Docker CI workflow for faster builds and multi-archit…

9e0fca8

…ecture support Signed-off-by: liuhy <[email protected]>

feat: add fast build workflow for development and update test-and-bui…

5b5f6ea

…ld to trigger it on PRs Signed-off-by: liuhy <[email protected]>

Aias00 force-pushed the feat/reduce_ci_duration branch from 5d425cc to 5b5f6ea Compare October 4, 2025 14:10

Aias00 added 2 commits October 4, 2025 22:11

Merge branch 'main' into feat/reduce_ci_duration

fb1d864

Merge branch 'main' into feat/reduce_ci_duration

7517f26

Aias00 closed this Oct 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(infra): optimize Docker CI workflow for faster builds #344

chore(infra): optimize Docker CI workflow for faster builds #344

Uh oh!

Aias00 commented Oct 4, 2025

Uh oh!

netlify bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

netlify bot commented Oct 4, 2025 •

edited

Loading

Uh oh!

rootfs commented Oct 4, 2025

Uh oh!

Aias00 commented Oct 4, 2025 via email

Uh oh!

rootfs commented Oct 4, 2025

Uh oh!

Uh oh!

chore(infra): optimize Docker CI workflow for faster builds #344

chore(infra): optimize Docker CI workflow for faster builds #344

Uh oh!

Conversation

Aias00 commented Oct 4, 2025

Uh oh!

netlify bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 Root Directory

🎉 Thanks for your contributions!

Uh oh!

netlify bot commented Oct 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

rootfs commented Oct 4, 2025

Uh oh!

Aias00 commented Oct 4, 2025 via email

Uh oh!

rootfs commented Oct 4, 2025

Uh oh!

Uh oh!

netlify bot commented Oct 4, 2025 •

edited

Loading

github-actions bot commented Oct 4, 2025 •

edited

Loading

📁 `Root Directory`

netlify bot commented Oct 4, 2025 •

edited

Loading