Skip to content

Commit 2bd777f

Browse files
OneZero-Yrootfs
authored andcommitted
feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) (#453)
feat:support for two long-context embedding models (Qwen3-Embedding-0.6B and EmbeddingGemma-300M) Signed-off-by: OneZero-Y <[email protected]> Signed-off-by: Huamin Chen <[email protected]>
1 parent 2b411f2 commit 2bd777f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

46 files changed

+35920
-107
lines changed

candle-binding/Cargo.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,19 @@ crate-type = ["staticlib", "cdylib"]
1212
[features]
1313
default = ["cuda"]
1414
cuda = ["candle-core/cuda", "candle-nn/cuda", "candle-transformers/cuda"]
15+
# Flash Attention 2 support (requires CUDA and compatible GPU)
16+
# Enable with: cargo build --features flash-attn
17+
# Note: Requires CUDA Compute Capability >= 8.0 (Ampere or newer)
18+
flash-attn = ["candle-flash-attn"]
1519

1620
[dependencies]
1721
anyhow = { version = "1", features = ["backtrace"] }
1822
candle-core = "0.8.4"
1923
candle-nn = "0.8.4"
2024
candle-transformers = "0.8.4"
25+
# Flash Attention 2 (optional, requires CUDA)
26+
# Reference: https://github.com/huggingface/candle/tree/main/candle-flash-attn
27+
candle-flash-attn = { version = "0.8.4", optional = true }
2128
tokenizers = { version = "0.21.0", features = ["http"] }
2229
hf-hub = "0.4.1"
2330
safetensors = "0.4.1"

0 commit comments

Comments
 (0)