-
Notifications
You must be signed in to change notification settings - Fork 297
feat(classifier): enable LoRA auto-detection for intent classification #726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(classifier): enable LoRA auto-detection for intent classification #726
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
https://github.com/vllm-project/semantic-router/actions/runs/19620268281?pr=726 can you check the report, i saw the accuracy drops to 40%, any ideas? |
|
it seems that the LoRA intent classifier has 40.71% accuracy vs ModernBERT's ~~ 63.83% accuracy |
|
@yossiovadia thanks, so i think we should hold this PR for a while util LoRA reaches almost the same accuracy with full param |
|
I think that lora_config.json wasn't downloaded from HF, probably there's .downloaded file prevents re-downloading( if cache exists, it skips the download of the model ) |
|
yes, dont approve / merge yet, let me see if we can get at least same result as moderbert. |
|
Hey @Xunzhuo - If u can clean the .download and re-kick this CI test again it would be great ? ( I'll only get back tmrw for this, it's getting late here ) |
0ef04b1 to
d4abd32
Compare
|
I'm back. testing the lora_config theory |
|
seems it still not fetching the config_lora, digging in. |
Investigation Update: Accuracy Regression ConfirmedI've investigated the accuracy drop and can confirm there is a measurable regression when switching from ModernBERT to LoRA for intent classification. FindingsBaseline (main branch with ModernBERT):
Current PR (with LoRA):
Analysis
Next StepsHolding this PR until we understand why the LoRA model underperforms ModernBERT on this dataset. Possible areas to investigate:
cc @Xunzhuo - confirming your observation about the accuracy drop. The auto-detection mechanism works correctly; the issue is with the LoRA model's performance on this specific dataset. |
|
Note to self ( and others ) |
…kens This commit fixes LoRA tokenization errors that occurred when processing inputs exceeding 512 tokens, which caused "index-select invalid index 512 with dim size 512" errors and resulted in empty predictions. Changes: - Added explicit truncation configuration to BertLoRAClassifier tokenizer - Added safety check in UnifiedTokenizer::tokenize_for_lora() - Ensures all inputs are properly truncated to BERT's 512 token limit Test results: - LoRA accuracy improved from ~40% (with empty predictions) to 80.36% - 0 tokenization errors on 280 MMLU-Pro test cases - 0 empty predictions Fixes the accuracy regression reported in vllm-project#726 Signed-off-by: Yossi Ovadia <[email protected]>
|
PR #709 ( mine...) enabled LoRA-based PII detection which has much higher sensitivity (90% success rate vs 27% with ModernBERT). While this fixed real PII detection issues (#647), it introduced a new problem: The LoRA PII model detects numbers in academic questions as PII: This seems to Impact on Domain Classification Tests: few examples : im checking maybe a lower PII threshold now. |
755804a to
83728cc
Compare
…kens This commit fixes LoRA tokenization errors that occurred when processing inputs exceeding 512 tokens, which caused "index-select invalid index 512 with dim size 512" errors and resulted in empty predictions. Changes: - Added explicit truncation configuration to BertLoRAClassifier tokenizer - Added safety check in UnifiedTokenizer::tokenize_for_lora() - Ensures all inputs are properly truncated to BERT's 512 token limit Test results: - LoRA accuracy improved from ~40% (with empty predictions) to 80.36% - 0 tokenization errors on 280 MMLU-Pro test cases - 0 empty predictions Fixes the accuracy regression reported in vllm-project#726 Signed-off-by: Yossi Ovadia <[email protected]>
800f9fa to
04c8c7a
Compare
…ect#724) This commit implements automatic detection of LoRA (Low-Rank Adaptation) models based on the presence of lora_config.json in the model directory. Changes: - Add LoRA auto-detection logic in Rust candle-binding layer - Implement fallback to BERT base model when LoRA config is not found - Add comprehensive test coverage for auto-detection mechanism - Update default Helm values to use LoRA intent classification model - Update ABrix deployment values to use LoRA models The auto-detection mechanism checks for lora_config.json during model initialization and automatically switches between LoRA and base BERT models without requiring explicit configuration changes. Signed-off-by: Yossi Ovadia <[email protected]>
This commit fixes two critical issues affecting classification accuracy: 1. Fixed IsCategoryEnabled() to check correct config field path: - Changed from c.Config.CategoryMappingPath (non-existent) - To c.Config.CategoryModel.CategoryMappingPath (correct) - This bug prevented LoRA classification from running in e2e tests 2. Optimized PII detection threshold from 0.7 to 0.9: - Reduces false positives from aggressive LoRA PII model (PR vllm-project#709) - Improves domain classification accuracy from 40.71% to 52.50% - Beats ModernBERT baseline of ~50% Updated e2e test configurations to use LoRA models with optimized thresholds across ai-gateway and dynamic-config profiles. Signed-off-by: Yossi Ovadia <[email protected]>
Increment cache version from v15 to v16 to ensure CI downloads the updated LoRA models that include lora_config.json files needed for auto-detection. Signed-off-by: Yossi Ovadia <[email protected]>
…olds Update default configuration to use LoRA-based classification: - Intent classification: lora_intent_classifier_bert-base-uncased_model - PII detection: lora_pii_detector_bert-base-uncased_model with threshold 0.9 This aligns the default config with e2e test configurations for consistency across all environments. Signed-off-by: Yossi Ovadia <[email protected]>
04c8c7a to
de3790a
Compare
|
maybe we should separately test domain classification and pii/jailbreak |

Summary
Enable LoRA auto-detection for intent/category classification, following the pattern from PII detection (PR #709).
Fixes #724
Changes
CategoryInitializerto auto-detect LoRA models vialora_config.jsonuseModernBERTlogic - now auto-detects model typeTechnical Details
Go Layer (
classifier.go):CategoryInitializerImplInitCandleBertClassifier()first (checks forlora_config.json)Rust Layer (
init.rs,classify.rs):LORA_INTENT_CLASSIFIERstatic variableinit_candle_bert_classifierwith intelligent model type detectionIntentLoRAClassifier::new()for LoRA modelsTesting
lora_intent_classifier_bert-base-uncased_model)TestIntentClassificationLoRAAutoDetection)Configuration
No configuration changes needed - auto-detection works with existing E2E config:
```yaml
classifier:
category_model:
model_id: "models/lora_intent_classifier_bert-base-uncased_model"
use_cpu: true
```
Related