-
Notifications
You must be signed in to change notification settings - Fork 181
refactor: Implement modular candle-binding architecture #207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
refactor: Implement modular candle-binding architecture #207
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
||
Ok(Self { | ||
bert_classifier: classifier, | ||
confidence_threshold: 0.7, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would the threshold come from config? https://github.com/vllm-project/semantic-router/blob/main/config/config.yaml#L3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threshold should be read from config.yaml instead of hardcoded
let intent = if predicted_class < self.intent_labels.len() { | ||
self.intent_labels[predicted_class].clone() | ||
} else { | ||
format!("UNKNOWN_{}", predicted_class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should just fail if no class found?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it should fail when no class is found rather than returning a default value. This ensures the caller knows that classification failed.
let intent = if predicted_class < self.intent_labels.len() { | ||
self.intent_labels[predicted_class].clone() | ||
} else { | ||
format!("UNKNOWN_{}", predicted_class) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fail here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will maintain consistent error handling here as well. When classification result cannot be determined, it should return an error.
// Skip "O" (Outside) labels - class 0 typically means no PII | ||
if class_idx > 0 && class_idx < self.pii_types.len() { | ||
has_pii = true; | ||
max_confidence = max_confidence.max(confidence); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for pii, we need to have confidence for each occurrence. The max confidence inflates the significance of others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provide independent confidence for each PII test item, not influenced by max value.
|
||
// Calculate severity score based on confidence and threat type | ||
let severity_score = if is_threat { | ||
confidence * 0.9 // High severity for detected threats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just return raw confidence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, should return raw confidence instead of processed severity_score.
})?; | ||
|
||
// Determine if threat is detected based on predicted class | ||
let is_threat = predicted_class > 0; // Assuming class 0 is "benign" or "safe" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the class label or class number instead of hardcode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will use class labels or class numbers instead of hardcoded strings.
|
||
for word in &words { | ||
// Generate contextual embedding for each word | ||
// This simulates BERT's contextualized embeddings based on word content |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you apply the bert embedding instead of simulation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will use real BERT embeddings instead of simulation data.
} | ||
|
||
/// Generate contextual embedding based on word content | ||
fn generate_contextual_embedding(&self, word: &str) -> Result<Tensor> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should use the model embedding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will use actual model embeddings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great! I am a bit tied up these days, will come back to review soon-ish.
3697826
to
76b941f
Compare
|
||
pub fn forward(&self, _input: &Tensor) -> Result<Tensor> { | ||
// Simplified placeholder implementation | ||
Tensor::zeros(&[1, 768], candle_core::DType::F32, &self.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment?
768 is modernbert base, the large model has 1024 hidden size.
/// Task type | ||
pub task: ClassificationTask, | ||
/// Classification result | ||
pub result: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are the probabilities
in ClassificationResult
missing?
/// This method computes embeddings for each candidate individually, | ||
/// which is suitable for small candidate lists. For large lists, | ||
/// consider batch processing. | ||
pub fn find_most_similar( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a thought for enhancement, maybe this can return top_k
use_reasoning: false | ||
|
||
# Router Configuration for Dual-Path Selection | ||
router: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can this go to a config file of its own and be loaded by the master config.yaml?
@OneZero-Y this is a quite major refactoring, would it be ok we merge to a dev branch first and we add test cases to validate before merging to main branch? |
@rootfs |
@OneZero-Y great! i created branch https://github.com/vllm-project/semantic-router/tree/feat-candle-refactoring |
- Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. Signed-off-by: OneZero-Y <[email protected]> refactor: Implement modular candle-binding architecture - Restructure codebase into modular layers (core/, ffi/, model_architectures/, classifiers/) - Add unified error handling and configuration loading systems - Implement dual-path architecture for traditional and LoRA models - Add comprehensive FFI layer with memory safety Maintains backward compatibility while enabling future model integrations. Signed-off-by: OneZero-Y <[email protected]>
76b941f
to
6cd74d5
Compare
@rootfs I have created a new PR. |
What type of PR is this?
refactor: Implement modular candle-binding architecture
What this PR does / why we need it:
Which issue(s) this PR fixes:
part of #59
Detailed Module Structure
1. FFI Interface Layer (ffi/)
C-compatible external interface layer with 10 modules:
Core Features:
2. Core Business Logic Layer (core/)
Pure business logic core with 5 modules:
Unified Error Handling:
UnifiedError
: Structured error typesConfigErrorType
: Configuration-related error subtypesValidationError
: Validation error typesconfig_errors!
,validation_error!
: Convenience macrosUnified Configuration Loading:
UnifiedConfigLoader
: Centralized configuration loadingIntentConfigLoader
,PIIConfigLoader
,SecurityConfigLoader
TokenConfigLoader
,LoRAConfigLoader
,ModernBertConfigLoader
3. Model Architectures Layer (model_architectures/)
Dual-path model architecture layer with 13 files:
Core Trait Architecture:
CoreModel
: Basic model operations (model_type, forward, get_config)PathSpecialization
: Path-specific optimizations (supports_parallel, confidence_threshold)ConfigurableModel
: Configuration-driven loading (load)LoRACapable
: LoRA capabilities (inherits CoreModel)TraditionalModel
: Traditional model capabilities (inherits CoreModel)4. Classifier Systems Layer (classifiers/)
Classifier systems layer with 11 files:
Multi-Task Parallel Processing:
5. Utilities Layer (utils/)
Intelligent memory management utilities layer with 2 files: