Skip to content

Commit f44f7c6

Browse files
committed
Merge upstream/main into feat-candle-refactoring
Resolved conflicts by merging both feature sets: - Kept CUDA as default feature in Cargo.toml (from upstream/main) - Preserved flash-attn feature (from feat-candle-refactoring) - Merged embedding_model and HNSW configurations in config files - Updated cache implementation to support both embedding models and HNSW indexing - Combined test targets and CI support in rust.mk - Regenerated Cargo.lock to reflect merged dependencies This merge brings the latest changes from main while preserving the candle refactoring work, enabling both embedding model selection and HNSW indexing features to work together.
2 parents 5ee8a1b + 7abe96a commit f44f7c6

File tree

98 files changed

+9227
-1019
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

98 files changed

+9227
-1019
lines changed

.github/workflows/pre-commit.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,8 @@ jobs:
9797

9898
- name: Run pre-commit check
9999
run: make precommit-check
100+
env:
101+
CI: true
100102

101103
- name: Show pre-commit results
102104
if: failure()

.github/workflows/publish-crate.yml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -71,17 +71,17 @@ jobs:
7171
exit 1
7272
fi
7373
74-
- name: Run tests
74+
- name: Run tests (CPU-only, no CUDA)
7575
working-directory: candle-binding
76-
run: cargo test --verbose
76+
run: cargo test --no-default-features --verbose
7777

78-
- name: Check crate
78+
- name: Check crate (CPU-only, no CUDA)
7979
working-directory: candle-binding
80-
run: cargo check --verbose
80+
run: cargo check --no-default-features --verbose
8181

82-
- name: Build crate
82+
- name: Build crate (CPU-only, no CUDA)
8383
working-directory: candle-binding
84-
run: cargo build --release --verbose
84+
run: cargo build --release --no-default-features --verbose
8585

8686
- name: Dry run publish
8787
working-directory: candle-binding

.github/workflows/test-and-build.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -69,8 +69,8 @@ jobs:
6969
- name: Check go mod tidy
7070
run: make check-go-mod-tidy
7171

72-
- name: Build Rust library
73-
run: make rust
72+
- name: Build Rust library (CPU-only, no CUDA)
73+
run: make rust-ci
7474

7575
- name: Install HuggingFace CLI
7676
run: |
@@ -86,6 +86,7 @@ jobs:
8686
- name: Run semantic router tests
8787
run: make test
8888
env:
89+
CI: true
8990
CGO_ENABLED: 1
9091
LD_LIBRARY_PATH: ${{ github.workspace }}/candle-binding/target/release
9192

.pre-commit-config.yaml

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,14 @@ repos:
2222
language: system
2323
files: \.go$
2424

25+
- repo: local
26+
hooks:
27+
- id: shellcheck
28+
name: shellcheck
29+
entry: make shellcheck
30+
language: system
31+
files: \.sh$
32+
2533
- repo: local
2634
hooks:
2735
- id: golang-lint
@@ -73,7 +81,7 @@ repos:
7381
pass_filenames: false
7482
- id: cargo-check
7583
name: cargo check
76-
entry: bash -c 'cd candle-binding && cargo check'
84+
entry: bash -c 'cd candle-binding && cargo check --no-default-features'
7785
language: system
7886
files: \.rs$
7987
pass_filenames: false
@@ -87,7 +95,7 @@ repos:
8795
language_version: python3
8896
files: \.py$
8997
exclude: ^(\.venv/|venv/|env/|__pycache__/|\.git/|site-packages/)
90-
98+
9199
# Commented out flake8 - only reports issues, doesn't auto-fix
92100
# - repo: https://github.com/PyCQA/flake8
93101
# rev: 7.3.0

Dockerfile.extproc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,24 +30,24 @@ COPY candle-binding/Cargo.loc[k] ./candle-binding/
3030
COPY tools/make/ tools/make/
3131
COPY Makefile ./
3232

33-
# Pre-build dependencies to cache them
33+
# Pre-build dependencies to cache them (CPU-only, no CUDA)
3434
RUN cd candle-binding && \
3535
mkdir -p src && \
3636
echo "fn main() {}" > src/lib.rs && \
37-
cargo build --release && \
37+
cargo build --release --no-default-features && \
3838
rm -rf src
3939

4040
# Copy source code and build
4141
COPY candle-binding/src/ ./candle-binding/src/
4242

43-
# Use Makefile to build the Rust library (rebuild with actual source code)
44-
RUN echo "Building Rust library with actual source code..." && \
43+
# Use Makefile to build the Rust library (rebuild with actual source code, CPU-only, no CUDA)
44+
RUN echo "Building Rust library with actual source code (CPU-only, no CUDA)..." && \
4545
echo "Checking source files:" && \
4646
ls -la candle-binding/src/ && \
4747
echo "Forcing clean rebuild..." && \
4848
cd candle-binding && \
4949
cargo clean && \
50-
cargo build --release && \
50+
cargo build --release --no-default-features && \
5151
echo "Checking built library:" && \
5252
find target -name "*.so" -type f && \
5353
ls -la target/release/

Dockerfile.extproc.cross

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -72,29 +72,29 @@ COPY candle-binding/Cargo.loc[k] ./candle-binding/
7272
COPY tools/make/ tools/make/
7373
COPY Makefile ./
7474

75-
# Create a modified Makefile for cross-compilation
75+
# Create a modified Makefile for cross-compilation (CPU-only, no CUDA)
7676
RUN if [ "$TARGETARCH" = "arm64" ]; then \
77-
echo "Modifying rust.mk for ARM64 cross-compilation..."; \
78-
sed -i 's/cd candle-binding && cargo build --release/cd candle-binding \&\& cargo build --release --target aarch64-unknown-linux-gnu/' tools/make/rust.mk; \
77+
echo "Modifying rust.mk for ARM64 cross-compilation (CPU-only, no CUDA)..."; \
78+
sed -i 's/cd candle-binding && cargo build --release/cd candle-binding \&\& cargo build --release --no-default-features --target aarch64-unknown-linux-gnu/' tools/make/rust.mk; \
7979
cat tools/make/rust.mk | grep "cargo build"; \
8080
fi
8181

82-
# Pre-build dependencies to cache them
82+
# Pre-build dependencies to cache them (CPU-only, no CUDA)
8383
RUN cd candle-binding && \
8484
mkdir -p src && \
8585
echo "fn main() {}" > src/lib.rs && \
8686
if [ "$TARGETARCH" = "arm64" ]; then \
87-
cargo build --release --target aarch64-unknown-linux-gnu; \
87+
cargo build --release --no-default-features --target aarch64-unknown-linux-gnu; \
8888
else \
89-
cargo build --release; \
89+
cargo build --release --no-default-features; \
9090
fi && \
9191
rm -rf src
9292

9393
# Copy source code and build
9494
COPY candle-binding/src/ ./candle-binding/src/
9595

96-
# Build with cross-compilation (rebuild with actual source code)
97-
RUN echo "Building Rust library with actual source code..." && \
96+
# Build with cross-compilation (rebuild with actual source code, CPU-only, no CUDA)
97+
RUN echo "Building Rust library with actual source code (CPU-only, no CUDA)..." && \
9898
echo "Current directory: $(pwd)" && \
9999
echo "TARGETARCH: $TARGETARCH" && \
100100
ls -la candle-binding/src/ && \
@@ -107,9 +107,9 @@ RUN echo "Building Rust library with actual source code..." && \
107107
export CC_aarch64_unknown_linux_gnu=aarch64-linux-gnu-gcc; \
108108
export CXX_aarch64_unknown_linux_gnu=aarch64-linux-gnu-g++; \
109109
export AR_aarch64_unknown_linux_gnu=aarch64-linux-gnu-ar; \
110-
cargo build --release --target aarch64-unknown-linux-gnu; \
110+
cargo build --release --no-default-features --target aarch64-unknown-linux-gnu; \
111111
else \
112-
cargo build --release --target x86_64-unknown-linux-gnu; \
112+
cargo build --release --no-default-features --target x86_64-unknown-linux-gnu; \
113113
fi && \
114114
echo "Checking built library..." && \
115115
find target -name "*.so" -type f

Dockerfile.precommit

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,5 +30,8 @@ RUN pip install --break-system-packages yamllint
3030
# CodeSpell
3131
RUN pip install --break-system-packages codespell
3232

33+
# Shellcheck
34+
RUN pip install --break-system-packages shellcheck-py
35+
3336
# Golangci-lint
3437
RUN curl -sSfL https://raw.githubusercontent.com/golangci/golangci-lint/HEAD/install.sh | sh -s -- -b $(go env GOPATH)/bin v2.5.0

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616

1717
*Latest News* 🔥
1818

19+
- [2025/10/21] We announced the [2025 Q4 Roadmap: Journey to Iris](https://vllm-semantic-router.com/blog/q4-roadmap-iris) 📅.
1920
- [2025/10/16] We established the [vLLM Semantic Router Youtube Channel](https://www.youtube.com/@vLLMSemanticRouter) ✨.
2021
- [2025/10/15] We announced the [vLLM Semantic Router Dashboard](https://www.youtube.com/watch?v=E2IirN8PsFw) 🚀.
2122
- [2025/10/12] Our paper [When to Reason: Semantic Router for vLLM](https://arxiv.org/abs/2510.08731) accepted by NeurIPS 2025 MLForSys 🧠.
@@ -75,7 +76,7 @@ Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the p
7576

7677
#### Prompt guard
7778

78-
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving.
79+
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.
7980

8081
### Similarity Caching ⚡️
8182

bench/build_and_test.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ echo "=============================================="
99

1010
# Clean previous builds
1111
echo "🧹 Cleaning previous builds..."
12-
rm -rf build/ dist/ *.egg-info/
12+
rm -rf build/ dist/ ./*.egg-info/
1313
find vllm_semantic_router_bench/ -name "__pycache__" -type d -exec rm -rf {} + 2>/dev/null || true
1414
find vllm_semantic_router_bench/ -name "*.pyc" -delete 2>/dev/null || true
1515

0 commit comments

Comments
 (0)