Skip to content

Commit 589e03e

Browse files
rootfscarloryJaredforRealXunzhuoyuluo-yx
authored
Merge main to candle refactoring (#523)
* Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]>
1 parent 5ee8a1b commit 589e03e

File tree

98 files changed

+9227
-1019
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+9227
-1019
lines changed

.github/workflows/pre-commit.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ jobs:
9797

9898
- name: Run pre-commit check
9999
run: make precommit-check
100+
env:
101+
CI: true
100102

101103
- name: Show pre-commit results
102104
if: failure()

.github/workflows/publish-crate.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -71,17 +71,17 @@ jobs:
7171
exit 1
7272
fi
7373
74-
- name: Run tests
74+
- name: Run tests (CPU-only, no CUDA)
7575
working-directory: candle-binding
76-
run: cargo test --verbose
76+
run: cargo test --no-default-features --verbose
7777

78-
- name: Check crate
78+
- name: Check crate (CPU-only, no CUDA)
7979
working-directory: candle-binding
80-
run: cargo check --verbose
80+
run: cargo check --no-default-features --verbose
8181

82-
- name: Build crate
82+
- name: Build crate (CPU-only, no CUDA)
8383
working-directory: candle-binding
84-
run: cargo build --release --verbose
84+
run: cargo build --release --no-default-features --verbose
8585

8686
- name: Dry run publish
8787
working-directory: candle-binding

.github/workflows/test-and-build.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ jobs:
6969
- name: Check go mod tidy
7070
run: make check-go-mod-tidy
7171

72-
- name: Build Rust library
73-
run: make rust
72+
- name: Build Rust library (CPU-only, no CUDA)
73+
run: make rust-ci
7474

7575
- name: Install HuggingFace CLI
7676
run: |
@@ -86,6 +86,7 @@ jobs:
8686
- name: Run semantic router tests
8787
run: make test
8888
env:
89+
CI: true
8990
CGO_ENABLED: 1
9091
LD_LIBRARY_PATH: ${{ github.workspace }}/candle-binding/target/release
9192

.pre-commit-config.yaml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ repos:
2222
language: system
2323
files: \.go$
2424

25+
- repo: local
26+
hooks:
27+
- id: shellcheck
28+
name: shellcheck
29+
entry: make shellcheck
30+
language: system
31+
files: \.sh$
32+
2533
- repo: local
2634
hooks:
2735
- id: golang-lint
@@ -73,7 +81,7 @@ repos:
7381
pass_filenames: false
7482
- id: cargo-check
7583
name: cargo check
76-
entry: bash -c 'cd candle-binding && cargo check'
84+
entry: bash -c 'cd candle-binding && cargo check --no-default-features'
7785
language: system
7886
files: \.rs$
7987
pass_filenames: false
@@ -87,7 +95,7 @@ repos:
8795
language_version: python3
8896
files: \.py$
8997
exclude: ^(\.venv/|venv/|env/|__pycache__/|\.git/|site-packages/)
90-
98+
9199
# Commented out flake8 - only reports issues, doesn't auto-fix
92100
# - repo: https://github.com/PyCQA/flake8
93101
# rev: 7.3.0

Dockerfile.extproc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,24 +30,24 @@ COPY candle-binding/Cargo.loc[k] ./candle-binding/
3030
COPY tools/make/ tools/make/
3131
COPY Makefile ./
3232

33-
# Pre-build dependencies to cache them
33+
# Pre-build dependencies to cache them (CPU-only, no CUDA)
3434
RUN cd candle-binding && \
3535
mkdir -p src && \
3636
echo "fn main() {}" > src/lib.rs && \
37-
cargo build --release && \
37+
cargo build --release --no-default-features && \
3838
rm -rf src
3939

4040
# Copy source code and build
4141
COPY candle-binding/src/ ./candle-binding/src/
4242

43-
# Use Makefile to build the Rust library (rebuild with actual source code)
44-
RUN echo "Building Rust library with actual source code..." && \
43+
# Use Makefile to build the Rust library (rebuild with actual source code, CPU-only, no CUDA)
44+
RUN echo "Building Rust library with actual source code (CPU-only, no CUDA)..." && \
4545
echo "Checking source files:" && \
4646
ls -la candle-binding/src/ && \
4747
echo "Forcing clean rebuild..." && \
4848
cd candle-binding && \
4949
cargo clean && \
50-
cargo build --release && \
50+
cargo build --release --no-default-features && \
5151
echo "Checking built library:" && \
5252
find target -name "*.so" -type f && \
5353
ls -la target/release/

Dockerfile.extproc.cross

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -72,29 +72,29 @@ COPY candle-binding/Cargo.loc[k] ./candle-binding/
7272
COPY tools/make/ tools/make/
7373
COPY Makefile ./
7474

75-
# Create a modified Makefile for cross-compilation
75+
# Create a modified Makefile for cross-compilation (CPU-only, no CUDA)
7676
RUN if [ "$TARGETARCH" = "arm64" ]; then \
77-
echo "Modifying rust.mk for ARM64 cross-compilation..."; \
78-
sed -i 's/cd candle-binding && cargo build --release/cd candle-binding \&\& cargo build --release --target aarch64-unknown-linux-gnu/' tools/make/rust.mk; \
77+
echo "Modifying rust.mk for ARM64 cross-compilation (CPU-only, no CUDA)..."; \
78+
sed -i 's/cd candle-binding && cargo build --release/cd candle-binding \&\& cargo build --release --no-default-features --target aarch64-unknown-linux-gnu/' tools/make/rust.mk; \
7979
cat tools/make/rust.mk | grep "cargo build"; \
8080
fi
8181

82-
# Pre-build dependencies to cache them
82+
# Pre-build dependencies to cache them (CPU-only, no CUDA)
8383
RUN cd candle-binding && \
8484
mkdir -p src && \
8585
echo "fn main() {}" > src/lib.rs && \
8686
if [ "$TARGETARCH" = "arm64" ]; then \
87-
cargo build --release --target aarch64-unknown-linux-gnu; \
87+
cargo build --release --no-default-features --target aarch64-unknown-linux-gnu; \
8888
else \
89-
cargo build --release; \
89+
cargo build --release --no-default-features; \
9090
fi && \
9191
rm -rf src
9292

9393
# Copy source code and build
9494
COPY candle-binding/src/ ./candle-binding/src/
9595

96-
# Build with cross-compilation (rebuild with actual source code)
97-
RUN echo "Building Rust library with actual source code..." && \
96+
# Build with cross-compilation (rebuild with actual source code, CPU-only, no CUDA)
97+
RUN echo "Building Rust library with actual source code (CPU-only, no CUDA)..." && \
9898
echo "Current directory: $(pwd)" && \
9999
echo "TARGETARCH: $TARGETARCH" && \
100100
ls -la candle-binding/src/ && \
@@ -107,9 +107,9 @@ RUN echo "Building Rust library with actual source code..." && \
107107
export CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc; \
108108
export CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++; \
109109
export AR_aarch64_unknown_linux_gnu=aarch64-linux-gnu-ar; \
110-
cargo build --release --target aarch64-unknown-linux-gnu; \
110+
cargo build --release --no-default-features --target aarch64-unknown-linux-gnu; \
111111
else \
112-
cargo build --release --target x86_64-unknown-linux-gnu; \
112+
cargo build --release --no-default-features --target x86_64-unknown-linux-gnu; \
113113
fi && \
114114
echo "Checking built library..." && \
115115
find target -name "*.so" -type f

Dockerfile.precommit

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,8 @@ RUN pip install --break-system-packages yamllint
3030
# CodeSpell
3131
RUN pip install --break-system-packages codespell
3232

33+
# Shellcheck
34+
RUN pip install --break-system-packages shellcheck-py
35+
3336
# Golangci-lint
3437
RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/HEAD/install.sh | sh -s -- -b $(go env GOPATH)/bin v2.5.0

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
*Latest News* 🔥
1818

19+
- [2025/10/21] We announced the [2025 Q4 Roadmap: Journey to Iris](https://vllm-semantic-router.com/blog/q4-roadmap-iris) 📅.
1920
- [2025/10/16] We established the [vLLM Semantic Router Youtube Channel](https://www.youtube.com/@vLLMSemanticRouter) ✨.
2021
- [2025/10/15] We announced the [vLLM Semantic Router Dashboard](https://www.youtube.com/watch?v=E2IirN8PsFw) 🚀.
2122
- [2025/10/12] Our paper [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
@@ -75,7 +76,7 @@ Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the p
7576

7677
#### Prompt guard
7778

78-
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving.
79+
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.
7980

8081
### Similarity Caching ⚡️
8182

bench/build_and_test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ echo "=============================================="
99

1010
# Clean previous builds
1111
echo "🧹 Cleaning previous builds..."
12-
rm -rf build/ dist/ *.egg-info/
12+
rm -rf build/ dist/ ./*.egg-info/
1313
find vllm_semantic_router_bench/ -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true
1414
find vllm_semantic_router_bench/ -name "*.pyc" -delete 2>/dev/null || true
1515

0 commit comments

Comments
 (0)