Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions deploy/helm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,8 @@ helm install semantic-router ./deploy/helm/semantic-router \
--create-namespace
```

> Need a registry mirror/proxy (e.g., in China)? Append `--set global.imageRegistry=<your-registry>` to any Helm install/upgrade command.

### Verify Installation

```bash
Expand Down
3 changes: 2 additions & 1 deletion deploy/helm/semantic-router/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ kubectl apply -f deploy/helm/semantic-router/crds/
| env[0].value | string | `"/app/lib"` | |
| fullnameOverride | string | `""` | Override the full name of the chart |
| global.namespace | string | `""` | Namespace for all resources (if not specified, uses Release.Namespace) |
| global.imageRegistry | string | `""` | Optional registry prefix applied to all images (e.g., mirror registry in China) |
| image.pullPolicy | string | `"IfNotPresent"` | Image pull policy |
| image.repository | string | `"ghcr.io/vllm-project/semantic-router/extproc"` | Image repository |
| image.tag | string | `"latest"` | Image tag (overrides the image tag whose default is the chart appVersion) |
Expand All @@ -238,7 +239,7 @@ kubectl apply -f deploy/helm/semantic-router/crds/
| ingress.hosts | list | `[{"host":"semantic-router.local","paths":[{"path":"/","pathType":"Prefix","servicePort":8080}]}]` | Ingress hosts configuration |
| ingress.tls | list | `[]` | Ingress TLS configuration |
| initContainer.enabled | bool | `true` | Enable init container |
| initContainer.image | string | `"python:3.11-slim"` | Init container image |
| initContainer.image | object | `{ "repository": "ghcr.io/vllm-project/semantic-router/model-downloader", "tag": "" (defaults to chart appVersion), "pullPolicy": "IfNotPresent" }` | Init container image |
| initContainer.models | list | `[{"name":"all-MiniLM-L12-v2","repo":"sentence-transformers/all-MiniLM-L12-v2"},{"name":"category_classifier_modernbert-base_model","repo":"LLM-Semantic-Router/category_classifier_modernbert-base_model"},{"name":"pii_classifier_modernbert-base_model","repo":"LLM-Semantic-Router/pii_classifier_modernbert-base_model"},{"name":"jailbreak_classifier_modernbert-base_model","repo":"LLM-Semantic-Router/jailbreak_classifier_modernbert-base_model"},{"name":"pii_classifier_modernbert-base_presidio_token_model","repo":"LLM-Semantic-Router/pii_classifier_modernbert-base_presidio_token_model"}]` | Models to download |
| initContainer.resources | object | `{"limits":{"cpu":"1000m","memory":"2Gi"},"requests":{"cpu":"500m","memory":"1Gi"}}` | Resource limits for init container |
| livenessProbe.enabled | bool | `true` | Enable liveness probe |
Expand Down
11 changes: 6 additions & 5 deletions deploy/helm/semantic-router/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,22 +26,23 @@ spec:
imagePullSecrets:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- $registry := trimSuffix "/" (default "" .Values.global.imageRegistry) }}
{{- $prefix := ternary "" (printf "%s/" $registry) (eq $registry "") }}
serviceAccountName: {{ include "semantic-router.serviceAccountName" . }}
securityContext:
{{- toYaml .Values.podSecurityContext | nindent 8 }}
{{- if .Values.initContainer.enabled }}
initContainers:
- name: model-downloader
image: {{ .Values.initContainer.image }}
{{- $initImage := .Values.initContainer.image }}
image: "{{ $prefix }}{{ $initImage.repository }}:{{ $initImage.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ $initImage.pullPolicy | default "IfNotPresent" }}
securityContext:
{{- toYaml .Values.securityContext | nindent 10 }}
command: ["/bin/bash", "-c"]
args:
- |
set -e
echo "Installing Hugging Face Hub..."
pip install -U --no-cache-dir "huggingface_hub>=0.19.0"
echo "Downloading models to persistent volume..."
cd /app/models
Expand Down Expand Up @@ -79,7 +80,7 @@ spec:
{{- end }}
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
image: "{{ $prefix }}{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
{{- with .Values.args }}
args:
Expand Down
8 changes: 7 additions & 1 deletion deploy/helm/semantic-router/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
# Declare variables to be passed into your templates.

# Global settings
global:

Check warning on line 6 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

6:1 [document-start] missing document start "---"
# -- Namespace for all resources (if not specified, uses Release.Namespace)
namespace: ""
# -- Optional registry prefix applied to all images (e.g., mirror in China such as registry.cn-hangzhou.aliyuncs.com)

Check failure on line 9 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

9:81 [line-length] line too long (119 > 80 characters)
imageRegistry: ""

# -- Number of replicas for the deployment
replicaCount: 1
Expand Down Expand Up @@ -47,7 +49,7 @@

# Pod security context
podSecurityContext: {}
# fsGroup: 2000

Check warning on line 52 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

52:3 [comments-indentation] comment not indented like content

# Container security context
securityContext:
Expand Down Expand Up @@ -100,7 +102,7 @@
className: ""
# -- Ingress annotations
annotations: {}
# kubernetes.io/ingress.class: nginx

Check warning on line 105 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

105:5 [comments-indentation] comment not indented like content
# kubernetes.io/tls-acme: "true"
# -- Ingress hosts configuration
hosts:
Expand Down Expand Up @@ -131,7 +133,11 @@
# -- Enable init container
enabled: true
# -- Init container image
image: python:3.11-slim
image:
repository: ghcr.io/vllm-project/semantic-router/model-downloader
# Leave empty to default to the chart AppVersion; override with a pinned tag if desired

Check failure on line 138 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

138:81 [line-length] line too long (91 > 80 characters)
tag: ""
pullPolicy: IfNotPresent
# -- Resource limits for init container
resources:
limits:
Expand Down Expand Up @@ -166,7 +172,7 @@
- name: jailbreak_classifier_modernbert-base_model
repo: LLM-Semantic-Router/jailbreak_classifier_modernbert-base_model
- name: pii_classifier_modernbert-base_presidio_token_model
repo: LLM-Semantic-Router/pii_classifier_modernbert-base_presidio_token_model

Check failure on line 175 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

175:81 [line-length] line too long (83 > 80 characters)
# LoRA PII detector (for auto-detection feature)
- name: lora_pii_detector_bert-base-uncased_model
repo: LLM-Semantic-Router/lora_pii_detector_bert-base-uncased_model
Expand Down Expand Up @@ -232,7 +238,7 @@
size: 10Gi
# -- Annotations for PVC
annotations: {}
# -- Existing claim name (if provided, will use existing PVC instead of creating new one)

Check failure on line 241 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

241:81 [line-length] line too long (91 > 80 characters)
existingClaim: ""

# Application configuration
Expand Down Expand Up @@ -267,7 +273,7 @@
model_id: "models/jailbreak_classifier_modernbert-base_model"
threshold: 0.7
use_cpu: true
jailbreak_mapping_path: "models/jailbreak_classifier_modernbert-base_model/jailbreak_type_mapping.json"

Check failure on line 276 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

276:81 [line-length] line too long (107 > 80 characters)

# Classifier configuration
classifier:
Expand All @@ -276,13 +282,13 @@
use_modernbert: true
threshold: 0.6
use_cpu: true
category_mapping_path: "models/category_classifier_modernbert-base_model/category_mapping.json"

Check failure on line 285 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

285:81 [line-length] line too long (101 > 80 characters)
pii_model:
model_id: "models/pii_classifier_modernbert-base_presidio_token_model"
use_modernbert: true
threshold: 0.7
use_cpu: true
pii_mapping_path: "models/pii_classifier_modernbert-base_presidio_token_model/pii_type_mapping.json"

Check failure on line 291 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

291:81 [line-length] line too long (106 > 80 characters)

# Reasoning families
reasoning_families:
Expand Down Expand Up @@ -313,7 +319,7 @@
detailed_goroutine_tracking: true
high_resolution_timing: false
sample_rate: 1.0
duration_buckets: [0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30]

Check failure on line 322 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

322:81 [line-length] line too long (94 > 80 characters)
size_buckets: [1, 2, 5, 10, 20, 50, 100, 200]

# Observability configuration
Expand Down Expand Up @@ -351,7 +357,7 @@
enum: ["celsius", "fahrenheit"]
description: "Temperature unit"
required: ["location"]
description: "Get current weather information, temperature, conditions, forecast for any location, city, or place. Check weather today, now, current conditions, temperature, rain, sun, cloudy, hot, cold, storm, snow"

Check failure on line 360 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

360:81 [line-length] line too long (220 > 80 characters)
category: "weather"
tags: ["weather", "temperature", "forecast", "climate"]
- tool:
Expand All @@ -370,7 +376,7 @@
description: "Number of results to return"
default: 5
required: ["query"]
description: "Search the internet, web search, find information online, browse web content, lookup, research, google, find answers, discover, investigate"

Check failure on line 379 in deploy/helm/semantic-router/values.yaml

View workflow job for this annotation

GitHub Actions / Run Validation Script

379:81 [line-length] line too long (158 > 80 characters)
category: "search"
tags: ["search", "web", "internet", "information", "browse"]
- tool:
Expand Down
1 change: 1 addition & 0 deletions website/docs/installation/k8s/ai-gateway.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,7 @@ Deploy the semantic router service with all required components using Helm:

```bash
# Install with custom values from GHCR OCI registry
# (Optional) If you use a registry mirror/proxy, append: --set global.imageRegistry=<your-registry>
helm install semantic-router oci://ghcr.io/vllm-project/charts/semantic-router \
--version v0.0.0-latest \
--namespace vllm-semantic-router-system \
Expand Down
1 change: 1 addition & 0 deletions website/docs/installation/k8s/aibrix.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Deploy the semantic router service with all required components using Helm:

```bash
# Install with custom values from GHCR OCI registry
# (Optional) If you use a registry mirror/proxy, append: --set global.imageRegistry=<your-registry>
helm install semantic-router oci://ghcr.io/vllm-project/charts/semantic-router \
--version v0.0.0-latest \
--namespace vllm-semantic-router-system \
Expand Down
1 change: 1 addition & 0 deletions website/docs/installation/k8s/production-stack.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ Deploy using Helm with custom values:

```bash
# Deploy vLLM Semantic Router with custom values from GHCR OCI registry
# (Optional) If you use a registry mirror/proxy, append: --set global.imageRegistry=<your-registry>
helm install semantic-router oci://ghcr.io/vllm-project/charts/semantic-router \
--version v0.0.0-latest \
--namespace vllm-semantic-router-system \
Expand Down
Loading