-
Notifications
You must be signed in to change notification settings - Fork 227
add ALERT dataset for jailbreak and filtering wiki training #416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: joyful-ii-V-I <[email protected]>
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
@joyful-ii-V-I thanks for the contribution. can you move these new files to a folder of their own? |
* - Add comprehensive_bench.sh, test_all_datasets.py, test_token_comparison.py - Add new dataset implementations: AQUA-RAT, DROP, GSM8K, MATH, OpenBookQA, SciQ, StrategyQA - Update router_reason_bench_multi_dataset.py with adaptive max token - Improved answer extraction and evaluation logic for multiple answer formats Signed-off-by: Huamin Chen <[email protected]> * Apply suggestion from @Copilot Co-authored-by: Copilot <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * Apply suggestion from @Copilot Co-authored-by: Copilot <[email protected]> Signed-off-by: Huamin Chen <[email protected]> * fix lint Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]>
…t#241 (vllm-project#354) * fix: enhance llm-katan OpenAI API compatibility for issue vllm-project#241 - Add missing OpenAI API response fields (system_fingerprint, logprobs, detailed usage) - Fix streaming response Content-Type from text/plain to text/event-stream - Ensure both static and streaming responses include all compatibility fields - Add token_usage alias for better SDK compatibility - Apply fixes to both TransformersBackend and VLLMBackend Resolves OpenWebUI hanging issue when connecting to llm-katan endpoints. Signed-off-by: Yossi Ovadia <[email protected]> * bump llm-katan version to 0.1.9 for PyPI release Published llm-katan v0.1.9 to PyPI with OpenAI API compatibility fixes. Signed-off-by: Yossi Ovadia <[email protected]> * chore: trigger CI re-run to check pre-commit status Trigger CI re-run to verify if Black formatting issues are resolved. Signed-off-by: Yossi Ovadia <[email protected]> * trigger pre-commit formatting fix Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply black formatting to llm-katan Python files Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply black formatting to llm-katan Python files for CI compliance Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]>
Replaces manual similarity calculation and query-based retrieval in FindSimilar with Milvus's Search API for more efficient and accurate similarity search. Updates index creation to use the new HNSW index API. Improves cache hit/miss logic and error handling. Signed-off-by: Srinivas A <[email protected]> Co-authored-by: Xunzhuo <[email protected]>
* chore: fix pre-commit failures in vllm-project#353 Signed-off-by: Huamin Chen <[email protected]> * chore: fix pre-commit failures in vllm-project#353 Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]>
…-project#355) (vllm-project#356) - Add streaming support to security response functions in response.go - Update CreateJailbreakViolationResponse() to return SSE format when isStreaming=true - Update CreatePIIViolationResponse() to return SSE format when isStreaming=true - Fix header consistency by using RawValue instead of Value for all headers - Update all call sites in request_handler.go to pass streaming context - Add comprehensive streaming tests to 05-jailbreak-test.py - Replace inappropriate test content with professional jailbreak testing patterns - Add TEST 5: Streaming jailbreak detection with SSE format validation - Add TEST 6: Streaming vs non-streaming consistency verification This resolves the issue where streaming clients like OpenWebUI would hang indefinitely when security violations occurred, as they expected SSE format but received JSON responses. Signed-off-by: Yossi Ovadia <[email protected]>
Signed-off-by: bitliu <[email protected]>
…age (vllm-project#362) - Create new pkg/headers package to manage all x-* prefixed headers - Define constants for request, VSR decision tracking, and security headers - Update security headers to use x-vsr-* prefix for consistency - x-pii-violation -> x-vsr-pii-violation - x-jailbreak-blocked -> x-vsr-jailbreak-blocked - x-jailbreak-type -> x-vsr-jailbreak-type - x-jailbreak-confidence -> x-vsr-jailbreak-confidence - Replace all string literals with constants across codebase - Fix variable naming conflict in request_handler.go BREAKING CHANGE: Security header names changed from x-* to x-vsr-* prefix. Clients consuming these headers must update to use new names. Signed-off-by: bitliu <[email protected]>
…llm-project#351) * refactor obser for Compose and add for Local Signed-off-by: JaredforReal <[email protected]> * fix docker-compose.obs path error Signed-off-by: JaredforReal <[email protected]> * get rig of network config Signed-off-by: JaredforReal <[email protected]> * Renamed observability to o11y-* with backward-compatible obs-* aliases Signed-off-by: JaredforReal <[email protected]> * refine docs, shell, and makefile Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: Huamin Chen <[email protected]>
Signed-off-by: bitliu <[email protected]>
* fix: keep memory cache metrics accurate Signed-off-by: cryo <[email protected]> * test: add test for metrics fix for UpdateWithResponse Signed-off-by: cryo <[email protected]> --------- Signed-off-by: cryo <[email protected]> Co-authored-by: Huamin Chen <[email protected]>
* feat: add OpenShift deployment infrastructure with GPU support This commit adds comprehensive OpenShift deployment support with GPU-enabled specialist model containers, providing a complete automation solution for deploying the semantic router to OpenShift clusters. **Core Deployment:** - deployment.yaml: Kubernetes deployment manifest with GPU support * 4-container pod: semantic-router, model-a, model-b, envoy-proxy * CDI annotations for GPU device injection (gpu=0, gpu=1) * GPU node selection and tolerations * PVC mounts for models and cache * Production log levels (INFO for containers, info for Envoy) - deploy-to-openshift.sh: Main deployment automation script (826 lines) * Auto-detection of OpenShift server and existing login * Enhanced deployment method with llm-katan specialists * Alternative methods: kustomize, template * Configurable resources, storage, logging * Automatic namespace creation * Inline Dockerfile build for llm-katan image * Service and route creation * Optional port forwarding (disabled by default) * Displays OpenWebUI endpoint at completion - cleanup-openshift.sh: Cleanup automation script (494 lines) * Auto-detection of cluster and namespace * Graceful cleanup with confirmation * Port forwarding cleanup * Comprehensive resource deletion **Configuration:** - config-openshift.yaml: Semantic router config for OpenShift * Math-specialist and coding-specialist endpoints * Category-to-specialist routing * PII and jailbreak detection configuration - envoy-openshift.yaml: Envoy proxy configuration * HTTP listener on port 8801 * External processing filter * Specialist model routing * /v1/models aggregation **Container Image:** - Dockerfile.llm-katan: GPU-enabled specialist container image * Python 3.10-slim base * PyTorch with CUDA 12.1 support * llm-katan, transformers, accelerate packages * HuggingFace caching configuration * Health check endpoint **Alternative Deployment Methods:** - kustomization.yaml: Kustomize deployment option - template.yaml: OpenShift template with parameters **Documentation & Validation:** - README.md: Comprehensive deployment documentation - validate-deployment.sh: 12-test validation script * Namespace, deployment, container readiness * GPU detection in both specialist containers * Model loading verification * PVC, service, route checks * GPU node scheduling confirmation - Makefile: Add include for tools/make/openshift.mk - tools/make/openshift.mk: Optional make targets for OpenShift operations * openshift-deploy, openshift-cleanup, openshift-status * openshift-logs, openshift-routes, openshift-test * Port forwarding helpers 1. **GPU Support**: Full NVIDIA GPU support via CDI device injection 2. **Specialist Models**: Real llm-katan containers for math/coding tasks 3. **Zero-Touch Deployment**: Auto-detection of cluster, automatic builds 4. **Production Ready**: Production log levels, proper health checks 5. **Validation**: Comprehensive 12-test validation suite 6. **UX Enhancements**: OpenWebUI endpoint display, optional port forwarding 7. **Clean Separation**: Only touches deploy/openshift/ (plus minimal Makefile) ``` Pod: semantic-router ├── semantic-router (main ExtProc service, port 50051) ├── model-a (llm-katan math specialist, port 8000, GPU 0) ├── model-b (llm-katan coding specialist, port 8001, GPU 1) └── envoy-proxy (gateway, port 8801) ``` Validated on OpenShift with NVIDIA L4 GPUs: - All 4 containers running - GPUs detected in both specialist containers - Models loaded on CUDA - PVCs bound - Services and routes accessible - Streaming functionality working 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: correct route URLs to use http instead of https Routes are created without TLS termination by default, so URLs should use http:// not https://. This fixes the quick test commands shown at deployment completion. Tested and verified: - curl http://semantic-router-api.../health works - curl -X POST http://semantic-router-api.../api/v1/classify/intent works Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>
…-project#378) * fix: resolve semantic cache hit streaming response format issue When semantic cache hits occur with streaming requests, the cached response (in chat.completion JSON format) was being returned directly without converting to SSE format (chat.completion.chunk), causing streaming clients to receive malformed responses. This fix: - Updates CreateCacheHitResponse() to accept isStreaming parameter - Converts cached chat.completion to chat.completion.chunk format for streaming - Sets appropriate content-type header (text/event-stream vs application/json) - Maintains backward compatibility for non-streaming requests - Adds comprehensive unit tests for both streaming and non-streaming cases Similar to the fix in a0f0581 for jailbreak/PII violations, this ensures consistent response format handling across all direct response scenarios. Resolves streaming client hanging issues when cache hits occur. Signed-off-by: bitliu <[email protected]> * fix lint Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: bitliu <[email protected]>
vllm-project#360) * feat: enhance CI pipeline with improved caching and multi-arch support Signed-off-by: liuhy <[email protected]> * feat: refactor Dockerfile to use Makefile for Rust library build Signed-off-by: liuhy <[email protected]> * feat: simplify Dockerfile by removing unnecessary directory creation Signed-off-by: liuhy <[email protected]> * feat: enhance Dockerfiles for cross-compilation and optimize CI pipeline Signed-off-by: liuhy <[email protected]> * feat: enhance cross-compilation support in Dockerfile for ARM64 and AMD64 Signed-off-by: liuhy <[email protected]> * feat: improve ARM64 cross-compilation process in Dockerfile Signed-off-by: liuhy <[email protected]> * feat: enhance Dockerfile to include detailed build output for Rust library Signed-off-by: liuhy <[email protected]> * feat: enhance Dockerfile to include detailed build output for Rust library Signed-off-by: liuhy <[email protected]> * feat: optimize CI pipeline with disk cleanup steps and improved build process Signed-off-by: liuhy <[email protected]> * feat: update CI configuration for ARM64 builds and improve Dockerfile syntax Signed-off-by: liuhy <[email protected]> * feat: simplify CI configuration by standardizing runner for ARM64 builds Signed-off-by: liuhy <[email protected]> * feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration Signed-off-by: liuhy <[email protected]> * feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration Signed-off-by: liuhy <[email protected]> * feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration Signed-off-by: liuhy <[email protected]> * feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration Signed-off-by: liuhy <[email protected]> * feat: update CI configuration for multi-architecture Docker image builds Signed-off-by: liuhy <[email protected]> * feat: update CI manifest tagging for pull requests with architecture suffix Signed-off-by: liuhy <[email protected]> * feat: remove architecture suffix from pull request tags in CI manifest Signed-off-by: liuhy <[email protected]> * feat: update CI manifest tagging for pull requests with architecture suffix Signed-off-by: liuhy <[email protected]> * feat: update CI manifest tagging for pull requests with architecture suffix Signed-off-by: liuhy <[email protected]> --------- Signed-off-by: liuhy <[email protected]>
* refactor deploy and tools Signed-off-by: JaredforReal <[email protected]> * add rebuild option Signed-off-by: JaredforReal <[email protected]> * add path Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: Huamin Chen <[email protected]>
* feat(openshift): add observability stack (Prometheus + Grafana) Add comprehensive observability monitoring for OpenShift deployments including: - Prometheus for metrics collection with 15-day retention - Grafana with pre-configured LLM Router dashboard - Model routing tracking (auto -> Model-A/B selection) - PII protection monitoring (violations by type) - Jailbreak detection and blocking metrics - Performance metrics (TTFT, TPOT, latency, tokens, cost) New deployment flags: - --with-observability: Deploy observability with semantic-router - --observability-only: Deploy only observability stack - --cleanup-observability: Remove only observability components All manifests under deploy/openshift/observability/ with kustomize support. OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped). Dashboard includes 12 panels tracking: - Prompt categories - Model routing rate (source -> target) - PII/Jailbreak refusal rates by model - Token usage, latency percentiles, costs - Security effectiveness (combined refusal %) Resolves monitoring requirements for model selection visibility and content safety tracking in OpenShift environments. Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): simplify for demo - strict PII, 2 categories, better model names Changes for cleaner observability demo: PII Policy: - Both models now strict (allow_by_default: false) - Only EMAIL_ADDRESS allowed for both coding-model and general-model - Makes PII violations easier to demonstrate consistently Model Renaming: - Model-A → coding-model (optimized for code/algorithms) - Model-B → general-model (general knowledge/business) - More intuitive names for demo purposes Categories Simplified (15 → 2): - coding: routes to coding-model (score 0.95, reasoning enabled) - general: routes to general-model (score 0.9) - Clearer routing behavior for demonstrations This configuration makes it easier to demonstrate: 1. Model routing based on category classification 2. PII detection and blocking (both models strict) 3. Jailbreak protection 4. Observability metrics in Grafana No Go code changes - config-only updates. Signed-off-by: Yossi Ovadia <[email protected]> * feat(grafana): relabel auto→semantic-router and update dashboard title - Add label_replace() to all panels to show "auto" as "semantic-router" - Update dashboard title to reflect new model names (coding-model, general-model) - All metrics now display consistent model naming across panels - Fixes confusion between "auto" routing and actual model names Affected panels: - Token Usage Rate by Model - Model Routing Rate (source_model and target_model) - Model Completion Latency (p95, p50/p90/p99) - TTFT/TPOT by Model - Reasoning Rate by Model - Model Cost Rate - Refusal Rates by Model (PII + Jailbreak) - Refusal Rate Percentage - Total Cost by Model 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): restore 15 categories for richer demo experience - Reverted from 2 categories back to full 15 categories - Kept model name changes: coding-model, general-model (not Model-A/B) - Kept strict PII policy for both models (only EMAIL allowed) - Categories now route to appropriate models: * coding-model: biology, chemistry, history, other, economics, math, physics, computer science, engineering * general-model: business, law, psychology, health, philosophy This provides a much better demo showing the rich classification capabilities, even though the classifier model needs retraining. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(config): revert to Model-A and Model-B naming - Changed back from coding-model/general-model to Model-A/Model-B - Kept 15 categories for rich demo experience - Kept strict PII policy for both models (only EMAIL allowed) - Updated Grafana dashboard title to reflect Model-A & Model-B - Dashboard label relabeling still shows "semantic-router" for "auto" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * feat(demo): enhance test script with Model-B traffic and better coverage - Moved script to deploy/openshift/ folder - Added Model-B prompts (psychology, business, health, philosophy, law) - Send 10 jailbreak attempts (better visibility in Grafana) - Send 10 PII test prompts (various PII types) - Use chat completions instead of just classification (triggers routing) - Updated help text to reflect Model-A/Model-B naming - All tests now send requests in parallel for better performance This ensures both Model-A and Model-B appear in Grafana dashboards. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix(grafana): prevent refusal rate panels from showing stale data Issue: Refusal Rates and Refusal Rate Percentage panels kept showing increasing values even when no traffic was present. Root cause: rate() returns empty results when no activity in the time window, but Grafana was showing last non-zero values or interpolating. Fix: - Added 'or vector(0)' to refusal rate queries to explicitly return 0 when no errors in the time window - Added 'or vector(1)' to denominator to prevent division by zero - Added interval and intervalFactor parameters for better scraping Affected panels: - Refusal Rates by Model (time series) - Refusal Rate Percentage by Model (bar gauge) Now panels correctly drop to 0 when traffic stops. Signed-off-by: Yossi Ovadia <[email protected]> * feat: enable observability by default with HTTPS and improved dashboard layout - Enable observability (Prometheus + Grafana) by default in deployment - Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect - Reorganize Grafana dashboard panels by function: * Semantic-router features on top (category, routing, refusal, reasoning) * Performance metrics in middle (latency, TTFT, TPOT, tokens) * Cost metrics at bottom (cost rate, total cost) - Update deployment script help text to reflect observability enabled by default - Fix dashboard YAML indentation for proper embedding Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit markdown formatting fixes - Fix blank lines around code fences - Remove multiple consecutive blank lines - Ensure proper spacing around lists Signed-off-by: Yossi Ovadia <[email protected]> * fix: update deployment output URLs to HTTPS and correct demo script path Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>
* feat: add OpenWebUI OpenShift integration with deployment scripts Add complete OpenWebUI deployment for OpenShift integration: - OpenWebUI deployment manifests with OpenShift security contexts - Automated deployment script with prerequisite validation - Safe uninstall script with single confirmation prompt - Internal service discovery (no hardcoded URLs) - Integration with Envoy proxy for model load balancing - Persistent storage for user data and configurations - HTTPS external access via OpenShift routes - Support for auto, Model-A, and Model-B endpoints Files added: - deploy/openshift/openwebui/deployment.yaml - deploy/openshift/openwebui/service.yaml - deploy/openshift/openwebui/route.yaml - deploy/openshift/openwebui/pvc.yaml - deploy/openshift/openwebui/kustomization.yaml - deploy/openshift/openwebui/deploy-openwebui-on-openshift.sh - deploy/openshift/openwebui/uninstall-openwebui.sh - deploy/openshift/openwebui/README.md Features: - Zero-config setup with automatic model discovery - OpenShift-compatible security contexts - Rich user feedback with colored output - Complete validation and connectivity testing - Safe cleanup with data preservation options 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> * fix: apply pre-commit markdown formatting fixes 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> --------- Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]>
Signed-off-by: bitliu <[email protected]>
Signed-off-by: bitliu <[email protected]>
Signed-off-by: bitliu <[email protected]>
…bug (vllm-project#390) Signed-off-by: yuluo-yx <[email protected]>
…ect#375) * feat: add out-of-tree and mcp based classification support Signed-off-by: Huamin Chen <[email protected]> * fix unit tests Signed-off-by: Huamin Chen <[email protected]> * update unit test Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review comments Signed-off-by: Huamin Chen <[email protected]> * add example regex based classification mcp server Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback: auto discover mcp tools Signed-off-by: Huamin Chen <[email protected]> * verify fixes are working Signed-off-by: Huamin Chen <[email protected]> * verify fixes are working Signed-off-by: Huamin Chen <[email protected]> * add missing file Signed-off-by: Huamin Chen <[email protected]> * fix lint Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]>
Signed-off-by: bitliu <[email protected]>
Signed-off-by: yuluo-yx <[email protected]> Co-authored-by: Huamin Chen <[email protected]>
…ject#402) - Add EditModal component with support for multiple field types (text, number, boolean, select, multiselect, textarea, json) - Implement full CRUD operations for Models and Endpoints (add, edit, delete) - Add edit functionality for all configuration sections: * Prompt Guard (PII Detection, Jailbreak Detection) * Similarity Cache (Similarity BERT, Semantic Cache) * Intelligent Routing (In-tree/Out-tree Classifiers) * Tools Selection * Observability (Distributed Tracing) * Classification API - Add edit functionality for Categories (system prompt, reasoning settings) - Add edit functionality for Reasoning Families - Add model score management in Categories (add, edit, delete model scores) - Implement dynamic dropdowns populated from configuration: * Reasoning Family selection from configured reasoning families * Preferred Endpoints selection from configured endpoints * Model selection for category model scores - Add complete pricing configuration (currency, prompt/completion costs) - Add complete PII policy configuration (allow/block, 17 PII types) - Visualize all configuration fields (no JSON input required) - Fix docker-compose volume mount to allow config file writes (ro -> rw) - Remove deprecated fields (endpoint models, health_check_path, monitoring dashboard) - Improve UI/UX with better card layouts, icons, and visual feedback Backend changes: - Add POST /api/router/config/update endpoint for configuration updates - Add JSON to YAML conversion for config file writes Frontend changes: - Create reusable EditModal component with field validation - Update ConfigPage with comprehensive edit buttons and handlers - Add CSS styles for edit buttons, model scores, and multiselect fields - Improve endpoint display (separate address, port, weight) This enables full configuration management through the dashboard UI without manual YAML editing. Signed-off-by: bitliu <[email protected]>
Signed-off-by: yuluo-yx <[email protected]>
* update README Signed-off-by: JaredforReal <[email protected]> * update README & delete network config Signed-off-by: JaredforReal <[email protected]> * add owner Signed-off-by: JaredforReal <[email protected]> * add dashboard demo to k8s Signed-off-by: JaredforReal <[email protected]> * add Xunzhuo to owner Signed-off-by: JaredforReal <[email protected]> --------- Signed-off-by: JaredforReal <[email protected]>
Signed-off-by: bitliu <[email protected]>
* update paper Signed-off-by: bitliu <[email protected]> * more Signed-off-by: bitliu <[email protected]> * more Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: bitliu <[email protected]>
* chore: add rootfs and yuluo-yx as website owners Signed-off-by: yuluo-yx <[email protected]> * fix Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]>
* docs: add missing observability articles to sidebar - Add distributed-tracing to observability section - Add open-webui-integration to observability section - Rename observability.md to metrics.md for better clarity - Remove tracing-quickstart.md as it's no longer needed This ensures all observability documentation is accessible through the website sidebar navigation. Signed-off-by: bitliu <[email protected]> * more Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: bitliu <[email protected]>
…llm-project#414) * refactor(config): move reasoning fields from Category to ModelScore Move ReasoningDescription and ReasoningEffort from Category level to ModelScore level to enable model-specific reasoning configuration. **What type of PR is this?** [Refactoring / Breaking Change] **What this PR does / why we need it**: This PR refactors the reasoning configuration structure by moving and fields from the Category level to the ModelScore level. This change enables more granular control over reasoning behavior at the model level rather than the category level. **Key Changes:** - Remove and from struct - These fields now exist in struct (model-level configuration) - Update to accept parameter - Update all configuration files to new format (9 files) - Update Python training scripts to generate new format - Update TypeScript dashboard to read from best model - Update all test files **Breaking Change:** This is a breaking change with no backward compatibility. Old configuration files must be migrated to the new format. **Migration:** Old format: ```yaml categories: - name: math reasoning_description: "..." reasoning_effort: high model_scores: - model: model-a use_reasoning: true ``` New format: ```yaml categories: - name: math model_scores: - model: model-a use_reasoning: true reasoning_description: "..." reasoning_effort: high ``` **Testing:** - All Go tests pass - All configuration files validated - TypeScript type checking passes Signed-off-by: Xunzhuo <[email protected]> Signed-off-by: bitliu <[email protected]> * fix(test): move ReasoningEffort to ModelScore in classification test Update generic_category_mapping_test.go to use the new structure where reasoning_effort is a field of ModelScore instead of Category. Signed-off-by: Xunzhuo <[email protected]> Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: Xunzhuo <[email protected]> Signed-off-by: bitliu <[email protected]>
* infra: add golangci lint check Signed-off-by: yuluo-yx <[email protected]> * infra: add golangci lint check Signed-off-by: yuluo-yx <[email protected]> * fi Signed-off-by: yuluo-yx <[email protected]> * fix Signed-off-by: yuluo-yx <[email protected]> * infra: add golang ci lint for precommit hook Signed-off-by: yuluo-yx <[email protected]> * fix test and build ci error Signed-off-by: yuluo-yx <[email protected]> * fix: fix precommit parallel golangci-lint run error Signed-off-by: yuluo-yx <[email protected]> * fix Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> Co-authored-by: Huamin Chen <[email protected]>
…ct#413) * refactor(config): remove models field from vLLM endpoints Remove the redundant models field from VLLMEndpoint configuration. Model-to-endpoint mapping is now solely determined by the preferred_endpoints field in model_config, eliminating the need for bidirectional association. Changes: - Remove Models field from VLLMEndpoint struct - Update GetEndpointsForModel to use only preferred_endpoints - Update GetAllModels to retrieve models from model_config keys - Update all configuration files to remove models field - Update all tests to reflect the new configuration structure - Update TypeScript interface in dashboard frontend This simplifies the configuration and removes potential inconsistencies between models and preferred_endpoints. Signed-off-by: bitliu <[email protected]> * more Signed-off-by: bitliu <[email protected]> * more Signed-off-by: bitliu <[email protected]> --------- Signed-off-by: bitliu <[email protected]>
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
@joyful-ii-V-I can you fix the pre-commit errors? |
What type of PR is this?
improve training data will ALERT training set and prevent wiki api from recursing more than 4 deep, this fixes the more obvious jailbreak criminal type activities
What this PR does / why we need it:
Needs better training data for model
Release Notes: No