Skip to content

Conversation

@Xunzhuo
Copy link
Member

@Xunzhuo Xunzhuo commented Nov 17, 2025

This commit introduces a comprehensive decision-based routing system with a flexible plugin architecture, replacing the previous category-based approach.

---
apiVersion: vllm.ai/v1alpha1
kind: IntelligentPool
metadata:
  name: ked-plugin-pool
  namespace: default
spec:
  defaultModel: "qwen-2.5-32b"
  models:
    - name: "qwen-2.5-32b"
      reasoningFamily: "qwen3"
      pricing:
        inputTokenPrice: 0.000002    # $2 per 1M - for standard enterprise queries
        outputTokenPrice: 0.000004   # $4 per 1M
      loras:
        - name: "enterprise-assistant"
          description: "Enterprise assistant"

    - name: "qwen-2.5-72b"
      reasoningFamily: "qwen3"
      pricing:
        inputTokenPrice: 0.000003    # $3 per 1M - for complex enterprise scenarios
        outputTokenPrice: 0.000006   # $6 per 1M
      loras:
        - name: "enterprise-specialist"
          description: "Enterprise domain specialist"
        - name: "compliance-expert"
          description: "Compliance and security expert"

---
apiVersion: vllm.ai/v1alpha1
kind: IntelligentRoute
metadata:
  name: ked-plugin-route
  namespace: default
spec:
  signals:
    keywords:
      - name: "compliance"
        operator: "OR"
        keywords: ["compliance", "regulation", "audit", "policy"]
        caseSensitive: false
      - name: "confidential"
        operator: "OR"
        keywords: ["confidential", "sensitive", "private", "restricted"]
        caseSensitive: false
    
    embeddings:
      - name: "business_analysis"
        threshold: 0.76
        candidates:
          - "I need to analyze business metrics and performance"
          - "Can you help me with strategic business planning?"
          - "We need insights on market trends and competition"
        aggregationMethod: "max"
      - name: "legal_review"
        threshold: 0.80
        candidates:
          - "This document needs legal review and compliance check"
          - "We need to ensure regulatory compliance"
          - "Can you review this for legal implications?"
        aggregationMethod: "mean"
    
    domains:
      - "business"
      - "law"
      - "economics"
  
  decisions:
    - name: "compliance_legal"
      description: "Compliance and legal queries with full protection"
      priority: 100
      signals:
        operator: "AND"
        conditions:
          - type: "keyword"
            name: "compliance"
          - type: "embedding"
            name: "legal_review"
          - type: "domain"
            name: "law"
      modelRefs:
        - model: "qwen-2.5-72b"
          use_reasoning: true
          reasoning_effort: "high"
      plugins:
        - type: "pii"
          configuration:
            enabled: true
            threshold: 0.9
            allow_by_default: false
            pii_types_allowed: ["PERSON", "ORGANIZATION", "EMAIL", "PHONE_NUMBER"]
        - type: "jailbreak"
          configuration:
            enabled: true
            threshold: 0.88
        - type: "system_prompt"
          configuration:
            system_prompt: "You are a legal compliance assistant. Provide accurate information about regulations and compliance requirements. Always remind users to consult legal professionals for specific advice."
        - type: "semantic-cache"
          configuration:
            enabled: true
            similarity_threshold: 0.93
        - type: "header_mutation"
          configuration:
            add:
              - name: "X-Compliance-Level"
                value: "high"
              - name: "X-Audit-Required"
                value: "true"
    
    - name: "confidential_business"
      description: "Confidential business analysis"
      priority: 90
      signals:
        operator: "AND"
        conditions:
          - type: "keyword"
            name: "confidential"
          - type: "embedding"
            name: "business_analysis"
          - type: "domain"
            name: "business"
      modelRefs:
        - model: "qwen-2.5-72b"
          use_reasoning: true
          reasoning_effort: "medium"
      plugins:
        - type: "pii"
          configuration:
            enabled: true
            threshold: 0.85
            allow_by_default: false
            pii_types_allowed: ["PERSON", "ORGANIZATION", "FINANCIAL_DATA"]
        - type: "semantic-cache"
          configuration:
            enabled: true
            similarity_threshold: 0.90
        - type: "header_mutation"
          configuration:
            add:
              - name: "X-Confidentiality"
                value: "high"
    
    - name: "general_business"
      description: "General business and economics queries"
      priority: 50
      signals:
        operator: "OR"
        conditions:
          - type: "embedding"
            name: "business_analysis"
          - type: "domain"
            name: "economics"
      modelRefs:
        - model: "qwen-2.5-72b"
          use_reasoning: false
      plugins:
        - type: "semantic-cache"
          configuration:
            enabled: true
            similarity_threshold: 0.85

Core Changes

1. Decision-Based Architecture

  • Replaced Category-based routing with Decision-based routing
  • Decisions combine multiple rules (keyword, embedding, domain) using AND/OR operators
  • Added DecisionEngine for evaluating rule combinations and selecting optimal decisions
  • Support for priority and confidence-based decision selection strategies

2. Plugin System

  • Introduced flexible plugin architecture for Decision-level configurations
  • Supported plugin types: semantic-cache, jailbreak, pii, system_prompt
  • Each plugin has type-specific configuration stored as raw JSON
  • Helper methods for type-safe plugin configuration access

3. Model References

  • Renamed ModelScores to ModelRefs, removed score field
  • Currently supports single model per decision (maxItems: 1)
  • Simplified model selection logic based on decision priority

4. Kubernetes CRD Integration

  • Added IntelligentPool and IntelligentRoute CRDs
  • CRD converter translates Kubernetes resources to internal config
  • Kubernetes controller watches and syncs CRD changes
  • Updated CRD schemas to use modelRefs and plugins arrays

Key Components

Decision Engine (pkg/decision/engine.go)

  • Evaluates rule combinations with AND/OR logic
  • Calculates confidence scores for matching decisions
  • Supports priority and confidence selection strategies

Configuration (pkg/config/config.go)

  • Decision structure with Rules, ModelRefs, and Plugins
  • Plugin configuration structs for each plugin type
  • Helper methods for accessing plugin configurations

CRD Types (pkg/apis/vllm.ai/v1alpha1/)

  • IntelligentPool: defines available models and their configurations
  • IntelligentRoute: defines routing decisions and rules
  • ModelRef: model reference without score field
  • DecisionPlugin: plugin configuration with type and raw config

Kubernetes Integration (pkg/k8s/)

  • Controller: watches CRDs and updates internal config
  • Converter: converts CRDs to internal config format
  • Comprehensive test coverage for CRD conversion

  • Make sure the code changes pass the pre-commit checks.
  • Sign-off your commit by using -s when doing git commit
  • Try to classify PRs for easy understanding of the type of changes, such as [Bugfix], [Feat], and [CI].
Detailed Checklist (Click to Expand)

Thank you for your contribution to semantic-router! Before submitting the pull request, please ensure the PR meets the following criteria. This helps us maintain the code quality and improve the efficiency of the review process.

PR Title and Classification

Please try to classify PRs for easy understanding of the type of changes. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:

  • [Bugfix] for bug fixes.
  • [CI/Build] for build or continuous integration improvements.
  • [Doc] for documentation fixes and improvements.
  • [Feat] for new features in the cluster (e.g., autoscaling, disaggregated prefill, etc.).
  • [Router] for changes to the vllm_router (e.g., routing algorithm, router observability, etc.).
  • [Misc] for PRs that do not fit the above categories. Please use this sparingly.

Note: If the PR spans more than one category, please include all relevant prefixes.

Code Quality

The PR need to meet the following code quality standards:

  • Pass all linter checks. Please use pre-commit to format your code. See README.md for installation.
  • The code need to be well-documented to ensure future contributors can easily understand the code.
  • Please include sufficient tests to ensure the change is stay correct and robust. This includes both unit tests and integration tests.

DCO and Signed-off-by

When contributing changes to this project, you must agree to the DCO. Commits must include a Signed-off-by: header which certifies agreement with the terms of the DCO.

Using -s with git commit will automatically add this header.

What to Expect for the Reviews

@netlify
Copy link

netlify bot commented Nov 17, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit cfdcd19
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/691b19fdf0198a0008dcde6b
😎 Deploy Preview https://deploy-preview-681--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Nov 17, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/integration-test-dynamic-config.yml

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/helm/semantic-router/crds/vllm.ai_intelligentpools.yaml
  • deploy/helm/semantic-router/crds/vllm.ai_intelligentroutes.yaml
  • deploy/helm/semantic-router/templates/clusterrole.yaml
  • deploy/helm/semantic-router/templates/clusterrolebinding.yaml
  • deploy/kubernetes/crds/vllm.ai_intelligentpools.yaml
  • deploy/kubernetes/crds/vllm.ai_intelligentroutes.yaml
  • deploy/helm/semantic-router/README.md
  • deploy/helm/semantic-router/values.yaml
  • deploy/kubernetes/ai-gateway/semantic-router-values/values.yaml
  • deploy/kubernetes/ai-gateway/semantic-router/config.yaml

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/profiles/dynamic-config/crds/intelligentpool.yaml
  • e2e/profiles/dynamic-config/crds/intelligentroute.yaml
  • e2e/profiles/dynamic-config/profile.go
  • e2e/profiles/dynamic-config/values.yaml
  • e2e/cmd/e2e/main.go

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/deploy/helm/semantic-router/templates/vllm.ai_intelligentpools.yaml
  • src/semantic-router/deploy/helm/semantic-router/templates/vllm.ai_intelligentroutes.yaml
  • src/semantic-router/deploy/kubernetes/crds/deploy/vllm.ai_intelligentpools.yaml
  • src/semantic-router/deploy/kubernetes/crds/deploy/vllm.ai_intelligentroutes.yaml
  • src/semantic-router/examples/decision-based-routing.yaml
  • src/semantic-router/pkg/apis/vllm.ai/v1alpha1/types_route.go
  • src/semantic-router/pkg/decision/engine.go
  • src/semantic-router/pkg/decision/engine_test.go
  • src/semantic-router/pkg/extproc/req_filter_header_mutation.go
  • src/semantic-router/pkg/k8s/controller.go
  • src/semantic-router/pkg/k8s/converter.go
  • src/semantic-router/pkg/k8s/converter_test.go
  • src/semantic-router/pkg/k8s/reconciler.go
  • src/semantic-router/pkg/k8s/testdata/README.md
  • src/semantic-router/pkg/k8s/testdata/base-config.yaml
  • src/semantic-router/pkg/k8s/testdata/input/01-basic.yaml
  • src/semantic-router/pkg/k8s/testdata/input/02-keyword-only.yaml
  • src/semantic-router/pkg/k8s/testdata/input/03-embedding-only.yaml
  • src/semantic-router/pkg/k8s/testdata/input/04-domain-only.yaml
  • src/semantic-router/pkg/k8s/testdata/input/05-keyword-embedding.yaml
  • src/semantic-router/pkg/k8s/testdata/input/06-keyword-domain.yaml
  • src/semantic-router/pkg/k8s/testdata/input/07-domain-embedding.yaml
  • src/semantic-router/pkg/k8s/testdata/input/08-keyword-embedding-domain.yaml
  • src/semantic-router/pkg/k8s/testdata/input/09-keyword-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/input/10-embedding-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/input/11-domain-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/input/12-keyword-embedding-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/input/13-keyword-domain-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/input/14-domain-embedding-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/input/15-keyword-embedding-domain-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/input/16-keyword-embedding-domain-no-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/01-basic.yaml
  • src/semantic-router/pkg/k8s/testdata/output/02-keyword-only.yaml
  • src/semantic-router/pkg/k8s/testdata/output/03-embedding-only.yaml
  • src/semantic-router/pkg/k8s/testdata/output/04-domain-only.yaml
  • src/semantic-router/pkg/k8s/testdata/output/05-keyword-embedding.yaml
  • src/semantic-router/pkg/k8s/testdata/output/06-keyword-domain.yaml
  • src/semantic-router/pkg/k8s/testdata/output/07-domain-embedding.yaml
  • src/semantic-router/pkg/k8s/testdata/output/08-keyword-embedding-domain.yaml
  • src/semantic-router/pkg/k8s/testdata/output/09-keyword-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/10-embedding-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/11-domain-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/12-keyword-embedding-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/13-keyword-domain-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/14-domain-embedding-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/15-keyword-embedding-domain-plugin.yaml
  • src/semantic-router/pkg/k8s/testdata/output/16-keyword-embedding-domain-no-plugin.yaml
  • src/semantic-router/cmd/main.go
  • src/semantic-router/go.mod
  • src/semantic-router/go.sum
  • src/semantic-router/pkg/apis/vllm.ai/v1alpha1/register.go
  • src/semantic-router/pkg/apis/vllm.ai/v1alpha1/types.go
  • src/semantic-router/pkg/apis/vllm.ai/v1alpha1/zz_generated.deepcopy.go
  • src/semantic-router/pkg/apiserver/route_system_prompt.go
  • src/semantic-router/pkg/apiserver/server.go
  • src/semantic-router/pkg/apiserver/server_test.go
  • src/semantic-router/pkg/classification/classifier.go
  • src/semantic-router/pkg/classification/classifier_test.go
  • src/semantic-router/pkg/classification/embedding_classifier.go
  • src/semantic-router/pkg/classification/keyword_classifier.go
  • src/semantic-router/pkg/classification/keyword_entropy_test.go
  • src/semantic-router/pkg/classification/mcp_classifier.go
  • src/semantic-router/pkg/config/config.go
  • src/semantic-router/pkg/config/config_test.go
  • src/semantic-router/pkg/config/helper.go
  • src/semantic-router/pkg/config/loader.go
  • src/semantic-router/pkg/config/validator.go
  • src/semantic-router/pkg/extproc/extproc_test.go
  • src/semantic-router/pkg/extproc/processor_req_body.go
  • src/semantic-router/pkg/extproc/processor_req_header.go
  • src/semantic-router/pkg/extproc/processor_res_header.go
  • src/semantic-router/pkg/extproc/recorder.go
  • src/semantic-router/pkg/extproc/req_filter_cache.go
  • src/semantic-router/pkg/extproc/req_filter_classification.go
  • src/semantic-router/pkg/extproc/req_filter_jailbreak.go
  • src/semantic-router/pkg/extproc/req_filter_pii.go
  • src/semantic-router/pkg/extproc/req_filter_reason.go
  • src/semantic-router/pkg/extproc/req_filter_sys_prompt.go
  • src/semantic-router/pkg/extproc/router.go
  • src/semantic-router/pkg/extproc/server.go
  • src/semantic-router/pkg/headers/headers.go
  • src/semantic-router/pkg/utils/pii/policy.go

📁 config

Owners: @rootfs, @Xunzhuo
Files changed:

  • config/intelligent-routing/in-tree/embedding.yaml

📁 tools

Owners: @yuluo-yx, @rootfs, @Xunzhuo
Files changed:

  • tools/make/golang.mk

📁 website

Owners: @Xunzhuo, @rootfs, @yuluo-yx
Files changed:

  • website/docs/tutorials/intelligent-route/domain-routing.md
  • website/docs/tutorials/intelligent-route/embedding-routing.md

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@Xunzhuo Xunzhuo force-pushed the feat/decision-based-routing-with-plugins branch from 51c60a3 to eb0d095 Compare November 17, 2025 07:45
@Xunzhuo Xunzhuo changed the title feat: implement decision-based routing with plugin architecture [Feat]: Implement signal/decision-based routing with dynamic plugin architecture Nov 17, 2025
@Xunzhuo Xunzhuo force-pushed the feat/decision-based-routing-with-plugins branch from 25f435e to 51cefd8 Compare November 17, 2025 10:52
@Xunzhuo Xunzhuo changed the title [Feat]: Implement signal/decision-based routing with dynamic plugin architecture [Feat]: Implement Signal Decision-based routing with dynamic plugin architecture Nov 17, 2025
@Xunzhuo Xunzhuo force-pushed the feat/decision-based-routing-with-plugins branch from fa33e47 to d7b2c50 Compare November 17, 2025 12:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants