Skip to content

Commit db19e8d

Browse files
OneZero-Yrootfs
authored andcommitted
feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) (vllm-project#453)
feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
1 parent ec6813d commit db19e8d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+35923
-107
lines changed

candle-binding/Cargo.toml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,21 @@ license = "MIT OR Apache-2.0"
99
name = "candle_semantic_router"
1010
crate-type = ["staticlib", "cdylib"]
1111

12+
[features]
13+
default = []
14+
# Flash Attention 2 support (requires CUDA and compatible GPU)
15+
# Enable with: cargo build --features flash-attn
16+
# Note: Requires CUDA Compute Capability >= 8.0 (Ampere or newer)
17+
flash-attn = ["candle-flash-attn"]
18+
1219
[dependencies]
1320
anyhow = { version = "1", features = ["backtrace"] }
1421
candle-core = "0.8.4"
1522
candle-nn = "0.8.4"
1623
candle-transformers = "0.8.4"
24+
# Flash Attention 2 (optional, requires CUDA)
25+
# Reference: https://github.com/huggingface/candle/tree/main/candle-flash-attn
26+
candle-flash-attn = { version = "0.8.4", optional = true }
1727
tokenizers = { version = "0.21.0", features = ["http"] }
1828
hf-hub = "0.4.1"
1929
safetensors = "0.4.1"

0 commit comments

Comments
 (0)