Skip to content

Conversation

joyful-ii-V-I
Copy link
Contributor

What type of PR is this?
improve training data will ALERT training set and prevent wiki api from recursing more than 4 deep, this fixes the more obvious jailbreak criminal type activities

What this PR does / why we need it:
Needs better training data for model

Release Notes: No

Copy link

netlify bot commented Oct 13, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit 1956d58
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/68ed96dd0063a50008a3e74d
😎 Deploy Preview https://deploy-preview-416--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@rootfs
Copy link
Collaborator

rootfs commented Oct 13, 2025

@joyful-ii-V-I thanks for the contribution. can you move these new files to a folder of their own?

rootfs and others added 23 commits October 13, 2025 10:55
* - Add comprehensive_bench.sh, test_all_datasets.py, test_token_comparison.py
- Add new dataset implementations: AQUA-RAT, DROP, GSM8K, MATH, OpenBookQA, SciQ, StrategyQA
- Update router_reason_bench_multi_dataset.py with adaptive max token
- Improved answer extraction and evaluation logic for multiple answer formats

Signed-off-by: Huamin Chen <[email protected]>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>

* Apply suggestion from @Copilot

Co-authored-by: Copilot <[email protected]>
Signed-off-by: Huamin Chen <[email protected]>

* fix lint

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
Co-authored-by: Copilot <[email protected]>
…t#241 (vllm-project#354)

* fix: enhance llm-katan OpenAI API compatibility for issue vllm-project#241

- Add missing OpenAI API response fields (system_fingerprint, logprobs, detailed usage)
- Fix streaming response Content-Type from text/plain to text/event-stream
- Ensure both static and streaming responses include all compatibility fields
- Add token_usage alias for better SDK compatibility
- Apply fixes to both TransformersBackend and VLLMBackend

Resolves OpenWebUI hanging issue when connecting to llm-katan endpoints.

Signed-off-by: Yossi Ovadia <[email protected]>

* bump llm-katan version to 0.1.9 for PyPI release

Published llm-katan v0.1.9 to PyPI with OpenAI API compatibility fixes.

Signed-off-by: Yossi Ovadia <[email protected]>

* chore: trigger CI re-run to check pre-commit status

Trigger CI re-run to verify if Black formatting issues are resolved.

Signed-off-by: Yossi Ovadia <[email protected]>

* trigger pre-commit formatting fix

Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply black formatting to llm-katan Python files

Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply black formatting to llm-katan Python files for CI compliance

Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Replaces manual similarity calculation and query-based retrieval in FindSimilar with Milvus's Search API for more efficient and accurate similarity search. Updates index creation to use the new HNSW index API. Improves cache hit/miss logic and error handling.

Signed-off-by: Srinivas A <[email protected]>
Co-authored-by: Xunzhuo <[email protected]>
* chore: fix pre-commit failures in vllm-project#353

Signed-off-by: Huamin Chen <[email protected]>

* chore: fix pre-commit failures in vllm-project#353

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
…-project#355) (vllm-project#356)

- Add streaming support to security response functions in response.go
- Update CreateJailbreakViolationResponse() to return SSE format when isStreaming=true
- Update CreatePIIViolationResponse() to return SSE format when isStreaming=true
- Fix header consistency by using RawValue instead of Value for all headers
- Update all call sites in request_handler.go to pass streaming context
- Add comprehensive streaming tests to 05-jailbreak-test.py
- Replace inappropriate test content with professional jailbreak testing patterns
- Add TEST 5: Streaming jailbreak detection with SSE format validation
- Add TEST 6: Streaming vs non-streaming consistency verification

This resolves the issue where streaming clients like OpenWebUI would hang
indefinitely when security violations occurred, as they expected SSE format
but received JSON responses.

Signed-off-by: Yossi Ovadia <[email protected]>
…age (vllm-project#362)

- Create new pkg/headers package to manage all x-* prefixed headers
- Define constants for request, VSR decision tracking, and security headers
- Update security headers to use x-vsr-* prefix for consistency
  - x-pii-violation -> x-vsr-pii-violation
  - x-jailbreak-blocked -> x-vsr-jailbreak-blocked
  - x-jailbreak-type -> x-vsr-jailbreak-type
  - x-jailbreak-confidence -> x-vsr-jailbreak-confidence
- Replace all string literals with constants across codebase
- Fix variable naming conflict in request_handler.go

BREAKING CHANGE: Security header names changed from x-* to x-vsr-* prefix.
Clients consuming these headers must update to use new names.

Signed-off-by: bitliu <[email protected]>
…llm-project#351)

* refactor obser for Compose and add for Local

Signed-off-by: JaredforReal <[email protected]>

* fix docker-compose.obs path error

Signed-off-by: JaredforReal <[email protected]>

* get rig of network config

Signed-off-by: JaredforReal <[email protected]>

* Renamed observability to o11y-* with backward-compatible obs-* aliases

Signed-off-by: JaredforReal <[email protected]>

* refine docs, shell, and makefile

Signed-off-by: JaredforReal <[email protected]>

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
* fix: keep memory cache metrics accurate

Signed-off-by: cryo <[email protected]>

* test: add test for metrics fix for UpdateWithResponse

Signed-off-by: cryo <[email protected]>

---------

Signed-off-by: cryo <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
* feat: add OpenShift deployment infrastructure with GPU support

This commit adds comprehensive OpenShift deployment support with GPU-enabled
specialist model containers, providing a complete automation solution for
deploying the semantic router to OpenShift clusters.

**Core Deployment:**
- deployment.yaml: Kubernetes deployment manifest with GPU support
  * 4-container pod: semantic-router, model-a, model-b, envoy-proxy
  * CDI annotations for GPU device injection (gpu=0, gpu=1)
  * GPU node selection and tolerations
  * PVC mounts for models and cache
  * Production log levels (INFO for containers, info for Envoy)

- deploy-to-openshift.sh: Main deployment automation script (826 lines)
  * Auto-detection of OpenShift server and existing login
  * Enhanced deployment method with llm-katan specialists
  * Alternative methods: kustomize, template
  * Configurable resources, storage, logging
  * Automatic namespace creation
  * Inline Dockerfile build for llm-katan image
  * Service and route creation
  * Optional port forwarding (disabled by default)
  * Displays OpenWebUI endpoint at completion

- cleanup-openshift.sh: Cleanup automation script (494 lines)
  * Auto-detection of cluster and namespace
  * Graceful cleanup with confirmation
  * Port forwarding cleanup
  * Comprehensive resource deletion

**Configuration:**
- config-openshift.yaml: Semantic router config for OpenShift
  * Math-specialist and coding-specialist endpoints
  * Category-to-specialist routing
  * PII and jailbreak detection configuration

- envoy-openshift.yaml: Envoy proxy configuration
  * HTTP listener on port 8801
  * External processing filter
  * Specialist model routing
  * /v1/models aggregation

**Container Image:**
- Dockerfile.llm-katan: GPU-enabled specialist container image
  * Python 3.10-slim base
  * PyTorch with CUDA 12.1 support
  * llm-katan, transformers, accelerate packages
  * HuggingFace caching configuration
  * Health check endpoint

**Alternative Deployment Methods:**
- kustomization.yaml: Kustomize deployment option
- template.yaml: OpenShift template with parameters

**Documentation & Validation:**
- README.md: Comprehensive deployment documentation
- validate-deployment.sh: 12-test validation script
  * Namespace, deployment, container readiness
  * GPU detection in both specialist containers
  * Model loading verification
  * PVC, service, route checks
  * GPU node scheduling confirmation

- Makefile: Add include for tools/make/openshift.mk
- tools/make/openshift.mk: Optional make targets for OpenShift operations
  * openshift-deploy, openshift-cleanup, openshift-status
  * openshift-logs, openshift-routes, openshift-test
  * Port forwarding helpers

1. **GPU Support**: Full NVIDIA GPU support via CDI device injection
2. **Specialist Models**: Real llm-katan containers for math/coding tasks
3. **Zero-Touch Deployment**: Auto-detection of cluster, automatic builds
4. **Production Ready**: Production log levels, proper health checks
5. **Validation**: Comprehensive 12-test validation suite
6. **UX Enhancements**: OpenWebUI endpoint display, optional port forwarding
7. **Clean Separation**: Only touches deploy/openshift/ (plus minimal Makefile)

```
Pod: semantic-router
├── semantic-router (main ExtProc service, port 50051)
├── model-a (llm-katan math specialist, port 8000, GPU 0)
├── model-b (llm-katan coding specialist, port 8001, GPU 1)
└── envoy-proxy (gateway, port 8801)
```

Validated on OpenShift with NVIDIA L4 GPUs:
- All 4 containers running
- GPUs detected in both specialist containers
- Models loaded on CUDA
- PVCs bound
- Services and routes accessible
- Streaming functionality working

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: correct route URLs to use http instead of https

Routes are created without TLS termination by default, so URLs should
use http:// not https://. This fixes the quick test commands shown at
deployment completion.

Tested and verified:
- curl http://semantic-router-api.../health works
- curl -X POST http://semantic-router-api.../api/v1/classify/intent works

Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
…-project#378)

* fix: resolve semantic cache hit streaming response format issue

When semantic cache hits occur with streaming requests, the cached
response (in chat.completion JSON format) was being returned directly
without converting to SSE format (chat.completion.chunk), causing
streaming clients to receive malformed responses.

This fix:
- Updates CreateCacheHitResponse() to accept isStreaming parameter
- Converts cached chat.completion to chat.completion.chunk format for streaming
- Sets appropriate content-type header (text/event-stream vs application/json)
- Maintains backward compatibility for non-streaming requests
- Adds comprehensive unit tests for both streaming and non-streaming cases

Similar to the fix in a0f0581 for jailbreak/PII violations, this ensures
consistent response format handling across all direct response scenarios.

Resolves streaming client hanging issues when cache hits occur.

Signed-off-by: bitliu <[email protected]>

* fix lint

Signed-off-by: bitliu <[email protected]>

---------

Signed-off-by: bitliu <[email protected]>
vllm-project#360)

* feat: enhance CI pipeline with improved caching and multi-arch support

Signed-off-by: liuhy <[email protected]>

* feat: refactor Dockerfile to use Makefile for Rust library build

Signed-off-by: liuhy <[email protected]>

* feat: simplify Dockerfile by removing unnecessary directory creation

Signed-off-by: liuhy <[email protected]>

* feat: enhance Dockerfiles for cross-compilation and optimize CI pipeline

Signed-off-by: liuhy <[email protected]>

* feat: enhance cross-compilation support in Dockerfile for ARM64 and AMD64

Signed-off-by: liuhy <[email protected]>

* feat: improve ARM64 cross-compilation process in Dockerfile

Signed-off-by: liuhy <[email protected]>

* feat: enhance Dockerfile to include detailed build output for Rust library

Signed-off-by: liuhy <[email protected]>

* feat: enhance Dockerfile to include detailed build output for Rust library

Signed-off-by: liuhy <[email protected]>

* feat: optimize CI pipeline with disk cleanup steps and improved build process

Signed-off-by: liuhy <[email protected]>

* feat: update CI configuration for ARM64 builds and improve Dockerfile syntax

Signed-off-by: liuhy <[email protected]>

* feat: simplify CI configuration by standardizing runner for ARM64 builds

Signed-off-by: liuhy <[email protected]>

* feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration

Signed-off-by: liuhy <[email protected]>

* feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration

Signed-off-by: liuhy <[email protected]>

* feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration

Signed-off-by: liuhy <[email protected]>

* feat: enhance Dockerfile for ARM64 cross-compilation with OpenSSL configuration

Signed-off-by: liuhy <[email protected]>

* feat: update CI configuration for multi-architecture Docker image builds

Signed-off-by: liuhy <[email protected]>

* feat: update CI manifest tagging for pull requests with architecture suffix

Signed-off-by: liuhy <[email protected]>

* feat: remove architecture suffix from pull request tags in CI manifest

Signed-off-by: liuhy <[email protected]>

* feat: update CI manifest tagging for pull requests with architecture suffix

Signed-off-by: liuhy <[email protected]>

* feat: update CI manifest tagging for pull requests with architecture suffix

Signed-off-by: liuhy <[email protected]>

---------

Signed-off-by: liuhy <[email protected]>
* refactor deploy and tools

Signed-off-by: JaredforReal <[email protected]>

* add rebuild option

Signed-off-by: JaredforReal <[email protected]>

* add path

Signed-off-by: JaredforReal <[email protected]>

---------

Signed-off-by: JaredforReal <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
* feat(openshift): add observability stack (Prometheus + Grafana)

Add comprehensive observability monitoring for OpenShift deployments including:
- Prometheus for metrics collection with 15-day retention
- Grafana with pre-configured LLM Router dashboard
- Model routing tracking (auto -> Model-A/B selection)
- PII protection monitoring (violations by type)
- Jailbreak detection and blocking metrics
- Performance metrics (TTFT, TPOT, latency, tokens, cost)

New deployment flags:
- --with-observability: Deploy observability with semantic-router
- --observability-only: Deploy only observability stack
- --cleanup-observability: Remove only observability components

All manifests under deploy/openshift/observability/ with kustomize support.
OpenShift-compatible security contexts (no runAsNonRoot, capabilities dropped).

Dashboard includes 12 panels tracking:
- Prompt categories
- Model routing rate (source -> target)
- PII/Jailbreak refusal rates by model
- Token usage, latency percentiles, costs
- Security effectiveness (combined refusal %)

Resolves monitoring requirements for model selection visibility and
content safety tracking in OpenShift environments.

Signed-off-by: Yossi Ovadia <[email protected]>

* feat(config): simplify for demo - strict PII, 2 categories, better model names

Changes for cleaner observability demo:

PII Policy:
- Both models now strict (allow_by_default: false)
- Only EMAIL_ADDRESS allowed for both coding-model and general-model
- Makes PII violations easier to demonstrate consistently

Model Renaming:
- Model-A → coding-model (optimized for code/algorithms)
- Model-B → general-model (general knowledge/business)
- More intuitive names for demo purposes

Categories Simplified (15 → 2):
- coding: routes to coding-model (score 0.95, reasoning enabled)
- general: routes to general-model (score 0.9)
- Clearer routing behavior for demonstrations

This configuration makes it easier to demonstrate:
1. Model routing based on category classification
2. PII detection and blocking (both models strict)
3. Jailbreak protection
4. Observability metrics in Grafana

No Go code changes - config-only updates.

Signed-off-by: Yossi Ovadia <[email protected]>

* feat(grafana): relabel auto→semantic-router and update dashboard title

- Add label_replace() to all panels to show "auto" as "semantic-router"
- Update dashboard title to reflect new model names (coding-model, general-model)
- All metrics now display consistent model naming across panels
- Fixes confusion between "auto" routing and actual model names

Affected panels:
- Token Usage Rate by Model
- Model Routing Rate (source_model and target_model)
- Model Completion Latency (p95, p50/p90/p99)
- TTFT/TPOT by Model
- Reasoning Rate by Model
- Model Cost Rate
- Refusal Rates by Model (PII + Jailbreak)
- Refusal Rate Percentage
- Total Cost by Model

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* feat(config): restore 15 categories for richer demo experience

- Reverted from 2 categories back to full 15 categories
- Kept model name changes: coding-model, general-model (not Model-A/B)
- Kept strict PII policy for both models (only EMAIL allowed)
- Categories now route to appropriate models:
  * coding-model: biology, chemistry, history, other, economics, math,
    physics, computer science, engineering
  * general-model: business, law, psychology, health, philosophy

This provides a much better demo showing the rich classification
capabilities, even though the classifier model needs retraining.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* feat(config): revert to Model-A and Model-B naming

- Changed back from coding-model/general-model to Model-A/Model-B
- Kept 15 categories for rich demo experience
- Kept strict PII policy for both models (only EMAIL allowed)
- Updated Grafana dashboard title to reflect Model-A & Model-B
- Dashboard label relabeling still shows "semantic-router" for "auto"

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* feat(demo): enhance test script with Model-B traffic and better coverage

- Moved script to deploy/openshift/ folder
- Added Model-B prompts (psychology, business, health, philosophy, law)
- Send 10 jailbreak attempts (better visibility in Grafana)
- Send 10 PII test prompts (various PII types)
- Use chat completions instead of just classification (triggers routing)
- Updated help text to reflect Model-A/Model-B naming
- All tests now send requests in parallel for better performance

This ensures both Model-A and Model-B appear in Grafana dashboards.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix(grafana): prevent refusal rate panels from showing stale data

Issue: Refusal Rates and Refusal Rate Percentage panels kept showing
increasing values even when no traffic was present.

Root cause: rate() returns empty results when no activity in the time
window, but Grafana was showing last non-zero values or interpolating.

Fix:
- Added 'or vector(0)' to refusal rate queries to explicitly return 0
  when no errors in the time window
- Added 'or vector(1)' to denominator to prevent division by zero
- Added interval and intervalFactor parameters for better scraping

Affected panels:
- Refusal Rates by Model (time series)
- Refusal Rate Percentage by Model (bar gauge)

Now panels correctly drop to 0 when traffic stops.

Signed-off-by: Yossi Ovadia <[email protected]>

* feat: enable observability by default with HTTPS and improved dashboard layout

- Enable observability (Prometheus + Grafana) by default in deployment
- Add HTTPS/TLS termination to Grafana and Prometheus routes with auto-redirect
- Reorganize Grafana dashboard panels by function:
  * Semantic-router features on top (category, routing, refusal, reasoning)
  * Performance metrics in middle (latency, TTFT, TPOT, tokens)
  * Cost metrics at bottom (cost rate, total cost)
- Update deployment script help text to reflect observability enabled by default
- Fix dashboard YAML indentation for proper embedding

Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit markdown formatting fixes

- Fix blank lines around code fences
- Remove multiple consecutive blank lines
- Ensure proper spacing around lists

Signed-off-by: Yossi Ovadia <[email protected]>

* fix: update deployment output URLs to HTTPS and correct demo script path

Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
* feat: add OpenWebUI OpenShift integration with deployment scripts

Add complete OpenWebUI deployment for OpenShift integration:

- OpenWebUI deployment manifests with OpenShift security contexts
- Automated deployment script with prerequisite validation
- Safe uninstall script with single confirmation prompt
- Internal service discovery (no hardcoded URLs)
- Integration with Envoy proxy for model load balancing
- Persistent storage for user data and configurations
- HTTPS external access via OpenShift routes
- Support for auto, Model-A, and Model-B endpoints

Files added:
- deploy/openshift/openwebui/deployment.yaml
- deploy/openshift/openwebui/service.yaml
- deploy/openshift/openwebui/route.yaml
- deploy/openshift/openwebui/pvc.yaml
- deploy/openshift/openwebui/kustomization.yaml
- deploy/openshift/openwebui/deploy-openwebui-on-openshift.sh
- deploy/openshift/openwebui/uninstall-openwebui.sh
- deploy/openshift/openwebui/README.md

Features:
- Zero-config setup with automatic model discovery
- OpenShift-compatible security contexts
- Rich user feedback with colored output
- Complete validation and connectivity testing
- Safe cleanup with data preservation options

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

* fix: apply pre-commit markdown formatting fixes

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Signed-off-by: Yossi Ovadia <[email protected]>

---------

Signed-off-by: Yossi Ovadia <[email protected]>
Co-authored-by: Claude <[email protected]>
…ect#375)

* feat: add out-of-tree and mcp based classification support

Signed-off-by: Huamin Chen <[email protected]>

* fix unit tests

Signed-off-by: Huamin Chen <[email protected]>

* update unit test

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review comments

Signed-off-by: Huamin Chen <[email protected]>

* add example regex based classification mcp server

Signed-off-by: Huamin Chen <[email protected]>

* review feedback

Signed-off-by: Huamin Chen <[email protected]>

* review feedback: auto discover mcp tools

Signed-off-by: Huamin Chen <[email protected]>

* verify fixes are working

Signed-off-by: Huamin Chen <[email protected]>

* verify fixes are working

Signed-off-by: Huamin Chen <[email protected]>

* add missing file

Signed-off-by: Huamin Chen <[email protected]>

* fix lint

Signed-off-by: Huamin Chen <[email protected]>

---------

Signed-off-by: Huamin Chen <[email protected]>
Xunzhuo and others added 15 commits October 13, 2025 10:55
…ject#402)

- Add EditModal component with support for multiple field types (text, number, boolean, select, multiselect, textarea, json)
- Implement full CRUD operations for Models and Endpoints (add, edit, delete)
- Add edit functionality for all configuration sections:
  * Prompt Guard (PII Detection, Jailbreak Detection)
  * Similarity Cache (Similarity BERT, Semantic Cache)
  * Intelligent Routing (In-tree/Out-tree Classifiers)
  * Tools Selection
  * Observability (Distributed Tracing)
  * Classification API
- Add edit functionality for Categories (system prompt, reasoning settings)
- Add edit functionality for Reasoning Families
- Add model score management in Categories (add, edit, delete model scores)
- Implement dynamic dropdowns populated from configuration:
  * Reasoning Family selection from configured reasoning families
  * Preferred Endpoints selection from configured endpoints
  * Model selection for category model scores
- Add complete pricing configuration (currency, prompt/completion costs)
- Add complete PII policy configuration (allow/block, 17 PII types)
- Visualize all configuration fields (no JSON input required)
- Fix docker-compose volume mount to allow config file writes (ro -> rw)
- Remove deprecated fields (endpoint models, health_check_path, monitoring dashboard)
- Improve UI/UX with better card layouts, icons, and visual feedback

Backend changes:
- Add POST /api/router/config/update endpoint for configuration updates
- Add JSON to YAML conversion for config file writes

Frontend changes:
- Create reusable EditModal component with field validation
- Update ConfigPage with comprehensive edit buttons and handlers
- Add CSS styles for edit buttons, model scores, and multiselect fields
- Improve endpoint display (separate address, port, weight)

This enables full configuration management through the dashboard UI without
manual YAML editing.

Signed-off-by: bitliu <[email protected]>
* update README

Signed-off-by: JaredforReal <[email protected]>

* update README & delete network config

Signed-off-by: JaredforReal <[email protected]>

* add owner

Signed-off-by: JaredforReal <[email protected]>

* add dashboard demo to k8s

Signed-off-by: JaredforReal <[email protected]>

* add Xunzhuo to owner

Signed-off-by: JaredforReal <[email protected]>

---------

Signed-off-by: JaredforReal <[email protected]>
* update paper

Signed-off-by: bitliu <[email protected]>

* more

Signed-off-by: bitliu <[email protected]>

* more

Signed-off-by: bitliu <[email protected]>

---------

Signed-off-by: bitliu <[email protected]>
* chore: add rootfs and yuluo-yx as website owners

Signed-off-by: yuluo-yx <[email protected]>

* fix

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>
* docs: add missing observability articles to sidebar

- Add distributed-tracing to observability section
- Add open-webui-integration to observability section
- Rename observability.md to metrics.md for better clarity
- Remove tracing-quickstart.md as it's no longer needed

This ensures all observability documentation is accessible through
the website sidebar navigation.

Signed-off-by: bitliu <[email protected]>

* more

Signed-off-by: bitliu <[email protected]>

---------

Signed-off-by: bitliu <[email protected]>
…llm-project#414)

* refactor(config): move reasoning fields from Category to ModelScore

Move ReasoningDescription and ReasoningEffort from Category level to ModelScore level to enable model-specific reasoning configuration.

**What type of PR is this?**

[Refactoring / Breaking Change]

**What this PR does / why we need it**:

This PR refactors the reasoning configuration structure by moving  and  fields from the Category level to the ModelScore level. This change enables more granular control over reasoning behavior at the model level rather than the category level.

**Key Changes:**
- Remove  and  from  struct
- These fields now exist in  struct (model-level configuration)
- Update  to accept  parameter
- Update all configuration files to new format (9 files)
- Update Python training scripts to generate new format
- Update TypeScript dashboard to read from best model
- Update all test files

**Breaking Change:**
This is a breaking change with no backward compatibility. Old configuration files must be migrated to the new format.

**Migration:**
Old format:
```yaml
categories:
  - name: math
    reasoning_description: "..."
    reasoning_effort: high
    model_scores:
      - model: model-a
        use_reasoning: true
```

New format:
```yaml
categories:
  - name: math
    model_scores:
      - model: model-a
        use_reasoning: true
        reasoning_description: "..."
        reasoning_effort: high
```

**Testing:**
- All Go tests pass
- All configuration files validated
- TypeScript type checking passes

Signed-off-by: Xunzhuo <[email protected]>
Signed-off-by: bitliu <[email protected]>

* fix(test): move ReasoningEffort to ModelScore in classification test

Update generic_category_mapping_test.go to use the new structure where
reasoning_effort is a field of ModelScore instead of Category.

Signed-off-by: Xunzhuo <[email protected]>
Signed-off-by: bitliu <[email protected]>

---------

Signed-off-by: Xunzhuo <[email protected]>
Signed-off-by: bitliu <[email protected]>
* infra: add golangci lint check

Signed-off-by: yuluo-yx <[email protected]>

* infra: add golangci lint check

Signed-off-by: yuluo-yx <[email protected]>

* fi

Signed-off-by: yuluo-yx <[email protected]>

* fix

Signed-off-by: yuluo-yx <[email protected]>

* infra: add golang ci lint for precommit hook

Signed-off-by: yuluo-yx <[email protected]>

* fix test and  build ci error

Signed-off-by: yuluo-yx <[email protected]>

* fix: fix precommit parallel golangci-lint run error

Signed-off-by: yuluo-yx <[email protected]>

* fix

Signed-off-by: yuluo-yx <[email protected]>

---------

Signed-off-by: yuluo-yx <[email protected]>
Co-authored-by: Huamin Chen <[email protected]>
…ct#413)

* refactor(config): remove models field from vLLM endpoints

Remove the redundant models field from VLLMEndpoint configuration.
Model-to-endpoint mapping is now solely determined by the
preferred_endpoints field in model_config, eliminating the need
for bidirectional association.

Changes:
- Remove Models field from VLLMEndpoint struct
- Update GetEndpointsForModel to use only preferred_endpoints
- Update GetAllModels to retrieve models from model_config keys
- Update all configuration files to remove models field
- Update all tests to reflect the new configuration structure
- Update TypeScript interface in dashboard frontend

This simplifies the configuration and removes potential
inconsistencies between models and preferred_endpoints.

Signed-off-by: bitliu <[email protected]>

* more

Signed-off-by: bitliu <[email protected]>

* more

Signed-off-by: bitliu <[email protected]>

---------

Signed-off-by: bitliu <[email protected]>
Copy link

github-actions bot commented Oct 13, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/training/wiki_filtered_classifier_tuning/multitask_alert_filtered_wiki_classifier_training.py

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@rootfs
Copy link
Collaborator

rootfs commented Oct 13, 2025

@joyful-ii-V-I can you fix the pre-commit errors?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.