Skip to content

Commit 859b40e

Browse files
committed
Merge upstream/main into feat-candle-refactoring
Resolved merge conflicts by integrating both feature sets: - Merged embedding model selection (bert/qwen3/gemma) with HNSW indexing - Combined flash-attn feature with cuda as default in Cargo.toml - Integrated both embedding_model and HNSW configuration options - Merged test targets from both branches in rust.mk - Updated Cargo.lock with merged dependencies All features from both branches are now available: - Semantic cache supports both embedding model selection and HNSW indexing - Rust builds support both CUDA (default) and Flash Attention (optional) - Test infrastructure includes both Rust unit tests and CI-friendly builds Signed-off-by: Huamin Chen <[email protected]>
2 parents 5ee8a1b + 7abe96a commit 859b40e

File tree

98 files changed

+9227
-1019
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+9227
-1019
lines changed

.github/workflows/pre-commit.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ jobs:
9797

9898
- name: Run pre-commit check
9999
run: make precommit-check
100+
env:
101+
CI: true
100102

101103
- name: Show pre-commit results
102104
if: failure()

.github/workflows/publish-crate.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -71,17 +71,17 @@ jobs:
7171
exit 1
7272
fi
7373
74-
- name: Run tests
74+
- name: Run tests (CPU-only, no CUDA)
7575
working-directory: candle-binding
76-
run: cargo test --verbose
76+
run: cargo test --no-default-features --verbose
7777

78-
- name: Check crate
78+
- name: Check crate (CPU-only, no CUDA)
7979
working-directory: candle-binding
80-
run: cargo check --verbose
80+
run: cargo check --no-default-features --verbose
8181

82-
- name: Build crate
82+
- name: Build crate (CPU-only, no CUDA)
8383
working-directory: candle-binding
84-
run: cargo build --release --verbose
84+
run: cargo build --release --no-default-features --verbose
8585

8686
- name: Dry run publish
8787
working-directory: candle-binding

.github/workflows/test-and-build.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ jobs:
6969
- name: Check go mod tidy
7070
run: make check-go-mod-tidy
7171

72-
- name: Build Rust library
73-
run: make rust
72+
- name: Build Rust library (CPU-only, no CUDA)
73+
run: make rust-ci
7474

7575
- name: Install HuggingFace CLI
7676
run: |
@@ -86,6 +86,7 @@ jobs:
8686
- name: Run semantic router tests
8787
run: make test
8888
env:
89+
CI: true
8990
CGO_ENABLED: 1
9091
LD_LIBRARY_PATH: ${{ github.workspace }}/candle-binding/target/release
9192

.pre-commit-config.yaml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ repos:
2222
language: system
2323
files: \.go$
2424

25+
- repo: local
26+
hooks:
27+
- id: shellcheck
28+
name: shellcheck
29+
entry: make shellcheck
30+
language: system
31+
files: \.sh$
32+
2533
- repo: local
2634
hooks:
2735
- id: golang-lint
@@ -73,7 +81,7 @@ repos:
7381
pass_filenames: false
7482
- id: cargo-check
7583
name: cargo check
76-
entry: bash -c 'cd candle-binding && cargo check'
84+
entry: bash -c 'cd candle-binding && cargo check --no-default-features'
7785
language: system
7886
files: \.rs$
7987
pass_filenames: false
@@ -87,7 +95,7 @@ repos:
8795
language_version: python3
8896
files: \.py$
8997
exclude: ^(\.venv/|venv/|env/|__pycache__/|\.git/|site-packages/)
90-
98+
9199
# Commented out flake8 - only reports issues, doesn't auto-fix
92100
# - repo: https://github.com/PyCQA/flake8
93101
# rev: 7.3.0

Dockerfile.extproc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,24 +30,24 @@ COPY candle-binding/Cargo.loc[k] ./candle-binding/
3030
COPY tools/make/ tools/make/
3131
COPY Makefile ./
3232

33-
# Pre-build dependencies to cache them
33+
# Pre-build dependencies to cache them (CPU-only, no CUDA)
3434
RUN cd candle-binding && \
3535
mkdir -p src && \
3636
echo "fn main() {}" > src/lib.rs && \
37-
cargo build --release && \
37+
cargo build --release --no-default-features && \
3838
rm -rf src
3939

4040
# Copy source code and build
4141
COPY candle-binding/src/ ./candle-binding/src/
4242

43-
# Use Makefile to build the Rust library (rebuild with actual source code)
44-
RUN echo "Building Rust library with actual source code..." && \
43+
# Use Makefile to build the Rust library (rebuild with actual source code, CPU-only, no CUDA)
44+
RUN echo "Building Rust library with actual source code (CPU-only, no CUDA)..." && \
4545
echo "Checking source files:" && \
4646
ls -la candle-binding/src/ && \
4747
echo "Forcing clean rebuild..." && \
4848
cd candle-binding && \
4949
cargo clean && \
50-
cargo build --release && \
50+
cargo build --release --no-default-features && \
5151
echo "Checking built library:" && \
5252
find target -name "*.so" -type f && \
5353
ls -la target/release/

Dockerfile.extproc.cross

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -72,29 +72,29 @@ COPY candle-binding/Cargo.loc[k] ./candle-binding/
7272
COPY tools/make/ tools/make/
7373
COPY Makefile ./
7474

75-
# Create a modified Makefile for cross-compilation
75+
# Create a modified Makefile for cross-compilation (CPU-only, no CUDA)
7676
RUN if [ "$TARGETARCH" = "arm64" ]; then \
77-
echo "Modifying rust.mk for ARM64 cross-compilation..."; \
78-
sed -i 's/cd candle-binding && cargo build --release/cd candle-binding \&\& cargo build --release --target aarch64-unknown-linux-gnu/' tools/make/rust.mk; \
77+
echo "Modifying rust.mk for ARM64 cross-compilation (CPU-only, no CUDA)..."; \
78+
sed -i 's/cd candle-binding && cargo build --release/cd candle-binding \&\& cargo build --release --no-default-features --target aarch64-unknown-linux-gnu/' tools/make/rust.mk; \
7979
cat tools/make/rust.mk | grep "cargo build"; \
8080
fi
8181

82-
# Pre-build dependencies to cache them
82+
# Pre-build dependencies to cache them (CPU-only, no CUDA)
8383
RUN cd candle-binding && \
8484
mkdir -p src && \
8585
echo "fn main() {}" > src/lib.rs && \
8686
if [ "$TARGETARCH" = "arm64" ]; then \
87-
cargo build --release --target aarch64-unknown-linux-gnu; \
87+
cargo build --release --no-default-features --target aarch64-unknown-linux-gnu; \
8888
else \
89-
cargo build --release; \
89+
cargo build --release --no-default-features; \
9090
fi && \
9191
rm -rf src
9292

9393
# Copy source code and build
9494
COPY candle-binding/src/ ./candle-binding/src/
9595

96-
# Build with cross-compilation (rebuild with actual source code)
97-
RUN echo "Building Rust library with actual source code..." && \
96+
# Build with cross-compilation (rebuild with actual source code, CPU-only, no CUDA)
97+
RUN echo "Building Rust library with actual source code (CPU-only, no CUDA)..." && \
9898
echo "Current directory: $(pwd)" && \
9999
echo "TARGETARCH: $TARGETARCH" && \
100100
ls -la candle-binding/src/ && \
@@ -107,9 +107,9 @@ RUN echo "Building Rust library with actual source code..." && \
107107
export CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc; \
108108
export CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++; \
109109
export AR_aarch64_unknown_linux_gnu=aarch64-linux-gnu-ar; \
110-
cargo build --release --target aarch64-unknown-linux-gnu; \
110+
cargo build --release --no-default-features --target aarch64-unknown-linux-gnu; \
111111
else \
112-
cargo build --release --target x86_64-unknown-linux-gnu; \
112+
cargo build --release --no-default-features --target x86_64-unknown-linux-gnu; \
113113
fi && \
114114
echo "Checking built library..." && \
115115
find target -name "*.so" -type f

Dockerfile.precommit

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,8 @@ RUN pip install --break-system-packages yamllint
3030
# CodeSpell
3131
RUN pip install --break-system-packages codespell
3232

33+
# Shellcheck
34+
RUN pip install --break-system-packages shellcheck-py
35+
3336
# Golangci-lint
3437
RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/HEAD/install.sh | sh -s -- -b $(go env GOPATH)/bin v2.5.0

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
*Latest News* 🔥
1818

19+
- [2025/10/21] We announced the [2025 Q4 Roadmap: Journey to Iris](https://vllm-semantic-router.com/blog/q4-roadmap-iris) 📅.
1920
- [2025/10/16] We established the [vLLM Semantic Router Youtube Channel](https://www.youtube.com/@vLLMSemanticRouter) ✨.
2021
- [2025/10/15] We announced the [vLLM Semantic Router Dashboard](https://www.youtube.com/watch?v=E2IirN8PsFw) 🚀.
2122
- [2025/10/12] Our paper [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
@@ -75,7 +76,7 @@ Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the p
7576

7677
#### Prompt guard
7778

78-
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving.
79+
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.
7980

8081
### Similarity Caching ⚡️
8182

bench/build_and_test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ echo "=============================================="
99

1010
# Clean previous builds
1111
echo "🧹 Cleaning previous builds..."
12-
rm -rf build/ dist/ *.egg-info/
12+
rm -rf build/ dist/ ./*.egg-info/
1313
find vllm_semantic_router_bench/ -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true
1414
find vllm_semantic_router_bench/ -name "*.pyc" -delete 2>/dev/null || true
1515

0 commit comments

Comments
 (0)