Skip to content

Commit 78fe00a

Browse files
rootfscarloryJaredforRealXunzhuoyuluo-yx
committed
Merge main to candle refactoring (#523)
* Update test description from Math to General (#483) Signed-off-by: carlory <[email protected]> * feat: add HuggingChat support (#477) * add chat ui to dashboard and docker compose & refactor dashboard/backend/ Signed-off-by: JaredforReal <[email protected]> * try fix network error Signed-off-by: JaredforReal <[email protected]> * more --------- Signed-off-by: JaredforReal <[email protected]> Co-authored-by: bitliu <[email protected]> * project: 2025 Q4 roadmap (#487) * project: q4 roadmap * project: q4 roadmap * project: q4 roadmap * more * more * more * more * feat: add shelleck precommit hook (#488) * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> * feat: add shelleck precommit hook Signed-off-by: yuluo-yx <[email protected]> --------- Signed-off-by: yuluo-yx <[email protected]> * project: add q4 roadmap news (#495) * fix missing shellcheck in pre-commit image (#497) Signed-off-by: carlory <[email protected]> * infra: update tools (#501) Signed-off-by: yuluo-yx <[email protected]> * feat(demo): enhance OpenShift demo scripts with improved UX (#478) - Reduce model selection test to 4 categories (2×Model-A, 2×Model-B) - Add new "Classification Examples" option calling curl-examples.sh - Update reasoning examples to avoid cache hits from previous tests - Remove benign examples from PII and Jailbreak tests (show only attacks) - Enhance live-semantic-router-logs.sh with better color visibility: - Fix duplicate "WITH SCORE" text in classification output - Fix CACHE HIT background color extending over timestamp - Distinguish reasoning enabled vs disabled messages - Remove redundant "(standard routing)" text - Add background colors for Model-A/Model-B routing display These improvements make the live demo clearer and more impactful for presentations and demonstrations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> * fix: fix precommit Argument list too long error (#502) Signed-off-by: yuluo-yx <[email protected]> * feat: enforce milvus dial timeout if set (#503) Signed-off-by: cryo <[email protected]> * Add IETF draft publication: Multi-Provider Extensions for Agentic AI Inference APIs (#506) * Initial plan * Add new IETF draft publication for Multi-Provider Extensions for Agentic AI Inference APIs Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> * Allow semantic cache similarity threshold to be set at the category level (#493) * Initial plan * Add category-level cache settings: enabled and similarity_threshold Co-authored-by: rootfs <[email protected]> * Add comprehensive tests for category-level cache settings Co-authored-by: rootfs <[email protected]> * Update config files and documentation for category-level cache settings - Updated 7 config YAML files (development, production, testing, e2e, and 3 recipes) with commented examples of category-level cache settings - Added comprehensive documentation section explaining category-level cache configuration - Updated semantic cache overview and in-memory cache docs with category-level examples - Added best practices for threshold selection and privacy considerations Co-authored-by: rootfs <[email protected]> * Remove duplicate code in FindSimilar functions Refactored FindSimilar() to delegate to FindSimilarWithThreshold() with default threshold instead of duplicating the entire implementation. This eliminates 226 lines of duplicate code across inmemory_cache.go and milvus_cache.go. Co-authored-by: rootfs <[email protected]> * Update src/semantic-router/pkg/extproc/request_handler.go Co-authored-by: Copilot <[email protected]> * Revert changes from unsigned commit ae39fe2 Restored the classificationText empty check that was removed in the previous commit. Co-authored-by: rootfs <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Huamin Chen <[email protected]> Co-authored-by: Copilot <[email protected]> * Allow jailbreak detection and threshold to be configured at the category level (#508) * Initial plan * Add category-level jailbreak detection configuration Co-authored-by: Xunzhuo <[email protected]> * Add documentation for category-level jailbreak settings Co-authored-by: Xunzhuo <[email protected]> * Update documentation for category-level jailbreak detection - Add category-level jailbreak configuration to jailbreak-protection.md - Update category configuration docs with jailbreak_enabled parameter - Add security-focused configuration example - Update global configuration docs with category override notes - Update README to mention fine-grained security control Co-authored-by: Xunzhuo <[email protected]> * Add category-level jailbreak threshold configuration - Add JailbreakThreshold field to Category struct - Add GetJailbreakThresholdForCategory helper method - Create CheckForJailbreakWithThreshold and AnalyzeContentForJailbreakWithThreshold methods - Update performSecurityChecks to use category-specific threshold - Add 5 comprehensive tests for threshold configuration - Update example configs with threshold tuning examples - Update documentation with threshold configuration and tuning guidelines - Add threshold tuning guide with recommendations for different category types Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Allow PII detection threshold to be set at the category level (#510) * Initial plan * Add category-level PII threshold support Co-authored-by: Xunzhuo <[email protected]> * Update documentation with API integration notes Co-authored-by: Xunzhuo <[email protected]> * Fix markdown linting issues Co-authored-by: Xunzhuo <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: Xunzhuo <[email protected]> * Fix: The caller information points to the wrapper function instead of the actual call location (#518) Signed-off-by: carlory <[email protected]> * feat: Implement hybrid cache that use in-memory index and milvus based doc store (#504) * feat: add HNSW index to inmemory semantic cache and implement hybrid cache that use in-memory index and milvus based doc store Signed-off-by: Huamin Chen <[email protected]> * chore: run go mod tidy to clean up module dependencies Signed-off-by: Huamin Chen <[email protected]> * conditionally build candle cuda support Signed-off-by: Huamin Chen <[email protected]> * rebuild index upon restart Signed-off-by: Huamin Chen <[email protected]> * precommit fix Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * fix precommit Signed-off-by: Huamin Chen <[email protected]> * disable cuda build on ci Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> * review feedback Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: Huamin Chen <[email protected]> --------- Signed-off-by: carlory <[email protected]> Signed-off-by: JaredforReal <[email protected]> Signed-off-by: yuluo-yx <[email protected]> Signed-off-by: Yossi Ovadia <[email protected]> Signed-off-by: cryo <[email protected]> Signed-off-by: Huamin Chen <[email protected]> Co-authored-by: 杨朱 · Kiki <[email protected]> Co-authored-by: Jared <[email protected]> Co-authored-by: bitliu <[email protected]> Co-authored-by: shown <[email protected]> Co-authored-by: Yossi Ovadia <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: cryo <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: rootfs <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Xunzhuo <[email protected]>
1 parent 2b72f27 commit 78fe00a

File tree

7 files changed

+86
-38
lines changed

7 files changed

+86
-38
lines changed

candle-binding/Cargo.lock

Lines changed: 35 additions & 35 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

candle-binding/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ name = "candle_semantic_router"
1010
crate-type = ["staticlib", "cdylib"]
1111

1212
[features]
13-
default = []
13+
default = ["cuda"]
1414
# CUDA support (enables GPU acceleration)
1515
cuda = ["candle-core/cuda", "candle-nn/cuda", "candle-transformers/cuda"]
1616
# Flash Attention 2 support (requires CUDA and compatible GPU)

config/config.yaml

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,15 @@ semantic_cache:
2424
# Options: "bert" (fast, 384-dim), "qwen3" (high quality, 1024-dim, 32K context), "gemma" (balanced, 768-dim, 8K context)
2525
# Default: "bert" (fastest, lowest memory)
2626
embedding_model: "bert"
27+
# HNSW index configuration (for memory backend only)
28+
use_hnsw: true # Enable HNSW index for faster similarity search
29+
hnsw_m: 16 # Number of bi-directional links (higher = better recall, more memory)
30+
hnsw_ef_construction: 200 # Construction parameter (higher = better quality, slower build)
31+
32+
# Hybrid cache configuration (when backend_type: "hybrid")
33+
# Combines in-memory HNSW for fast search with Milvus for scalable storage
34+
# max_memory_entries: 100000 # Max entries in HNSW index (default: 100,000)
35+
# backend_config_path: "config/milvus.yaml" # Path to Milvus config
2736

2837
tools:
2938
enabled: true

src/semantic-router/pkg/cache/cache_factory.go

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,8 @@ func NewCacheBackend(config CacheConfig) (CacheBackend, error) {
2424
switch config.BackendType {
2525
case InMemoryCacheType, "":
2626
// Use in-memory cache as the default backend
27-
observability.Debugf("Creating in-memory cache backend - MaxEntries: %d, TTL: %ds, Threshold: %.3f, UseHNSW: %t, EmbeddingModel: %s",
28-
config.MaxEntries, config.TTLSeconds, config.SimilarityThreshold, config.UseHNSW, config.EmbeddingModel)
27+
observability.Debugf("Creating in-memory cache backend - MaxEntries: %d, TTL: %ds, Threshold: %.3f, EmbeddingModel: %s, UseHNSW: %t",
28+
config.MaxEntries, config.TTLSeconds, config.SimilarityThreshold, config.EmbeddingModel, config.UseHNSW)
2929

3030
options := InMemoryCacheOptions{
3131
Enabled: config.Enabled,
@@ -37,6 +37,9 @@ func NewCacheBackend(config CacheConfig) (CacheBackend, error) {
3737
HNSWM: config.HNSWM,
3838
HNSWEfConstruction: config.HNSWEfConstruction,
3939
EmbeddingModel: config.EmbeddingModel,
40+
UseHNSW: config.UseHNSW,
41+
HNSWM: config.HNSWM,
42+
HNSWEfConstruction: config.HNSWEfConstruction,
4043
}
4144
return NewInMemoryCache(options), nil
4245

src/semantic-router/pkg/cache/cache_interface.go

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,4 +120,16 @@ type CacheConfig struct {
120120
// EmbeddingModel specifies which embedding model to use
121121
// Options: "bert" (default), "qwen3", "gemma"
122122
EmbeddingModel string `yaml:"embedding_model,omitempty"`
123+
124+
// UseHNSW enables HNSW index for faster search in memory backend
125+
UseHNSW bool `yaml:"use_hnsw,omitempty"`
126+
127+
// HNSWM is the number of bi-directional links per node (default: 16)
128+
HNSWM int `yaml:"hnsw_m,omitempty"`
129+
130+
// HNSWEfConstruction is the size of dynamic candidate list during construction (default: 200)
131+
HNSWEfConstruction int `yaml:"hnsw_ef_construction,omitempty"`
132+
133+
// Hybrid cache specific settings
134+
MaxMemoryEntries int `yaml:"max_memory_entries,omitempty"` // Max entries in HNSW for hybrid cache
123135
}

src/semantic-router/pkg/cache/inmemory_cache.go

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,9 @@ type InMemoryCache struct {
5252
useHNSW bool
5353
hnswEfSearch int // Search-time ef parameter
5454
embeddingModel string // "bert", "qwen3", or "gemma"
55+
hnswIndex *HNSWIndex
56+
useHNSW bool
57+
hnswEfSearch int // Search-time ef parameter
5558
}
5659

5760
// InMemoryCacheOptions contains configuration parameters for the in-memory cache
@@ -66,6 +69,10 @@ type InMemoryCacheOptions struct {
6669
HNSWEfConstruction int // Size of dynamic candidate list during construction (default: 200)
6770
HNSWEfSearch int // Size of dynamic candidate list during search (default: 50)
6871
EmbeddingModel string // "bert", "qwen3", or "gemma"
72+
UseHNSW bool // Enable HNSW index for faster search
73+
HNSWM int // Number of bi-directional links (default: 16)
74+
HNSWEfConstruction int // Size of dynamic candidate list during construction (default: 200)
75+
HNSWEfSearch int // Size of dynamic candidate list during search (default: 50)
6976
}
7077

7178
// NewInMemoryCache initializes a new in-memory semantic cache instance

tools/make/rust.mk

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,3 +118,20 @@ rust-flash-attn: ## Build Rust library with Flash Attention 2 (requires CUDA env
118118
exit 1; \
119119
fi
120120
@cd candle-binding && cargo build --release --features flash-attn
121+
122+
# Build the Rust library without CUDA (for CI/CD environments)
123+
rust-ci: ## Build the Rust library without CUDA support (for GitHub Actions/CI)
124+
@$(LOG_TARGET)
125+
@bash -c 'if ! command -v rustc >/dev/null 2>&1; then \
126+
echo "rustc not found, installing..."; \
127+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y; \
128+
fi && \
129+
if [ -f "$$HOME/.cargo/env" ]; then \
130+
echo "Loading Rust environment from $$HOME/.cargo/env..." && \
131+
. $$HOME/.cargo/env; \
132+
fi && \
133+
if ! command -v cargo >/dev/null 2>&1; then \
134+
echo "Error: cargo not found in PATH" && exit 1; \
135+
fi && \
136+
echo "Building Rust library without CUDA (CPU-only)..." && \
137+
cd candle-binding && cargo build --release --no-default-features'

0 commit comments

Comments
 (0)