Skip to content

Conversation

@yossiovadia
Copy link
Collaborator

Summary

Enable LoRA auto-detection for intent/category classification, following the pattern from PII detection (PR #709).

Fixes #724

Changes

  • Updated CategoryInitializer to auto-detect LoRA models via lora_config.json
  • Updated Rust FFI layer to support intent LoRA classification
  • Removed hardcoded useModernBERT logic - now auto-detects model type
  • Maintains backward compatibility with Traditional BERT and ModernBERT fallback
  • Net reduction: 31 additions, 40 deletions

Technical Details

Go Layer (classifier.go):

  • Replaced separate initializers with unified CategoryInitializerImpl
  • Auto-detection tries InitCandleBertClassifier() first (checks for lora_config.json)
  • Falls back to ModernBERT if LoRA initialization fails

Rust Layer (init.rs, classify.rs):

  • Added LORA_INTENT_CLASSIFIER static variable
  • Updated init_candle_bert_classifier with intelligent model type detection
  • Routes to IntentLoRAClassifier::new() for LoRA models
  • Proper fallback chain: LoRA → Traditional BERT → ModernBERT

Testing

  • Local testing with LoRA model (lora_intent_classifier_bert-base-uncased_model)
  • Auto-detection test passes (TestIntentClassificationLoRAAutoDetection)
  • E2E classification tests pass (03-classification-api-test.py)
  • Backward compatibility maintained
  • Pre-commit checks pass

Configuration

No configuration changes needed - auto-detection works with existing E2E config:
```yaml
classifier:
category_model:
model_id: "models/lora_intent_classifier_bert-base-uncased_model"
use_cpu: true
```

Related

@netlify
Copy link

netlify bot commented Nov 23, 2025

Deploy Preview for vllm-semantic-router ready!

Name Link
🔨 Latest commit de3790a
🔍 Latest deploy log https://app.netlify.com/projects/vllm-semantic-router/deploys/692523e0a1c48c00080b4cbc
😎 Deploy Preview https://deploy-preview-726--vllm-semantic-router.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@github-actions
Copy link

github-actions bot commented Nov 23, 2025

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 src

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

  • src/semantic-router/pkg/classification/lora_auto_detection_test.go
  • src/semantic-router/pkg/classification/classifier.go

📁 Root Directory

Owners: @rootfs, @Xunzhuo
Files changed:

  • .github/workflows/test-and-build.yml

📁 candle-binding

Owners: @rootfs
Files changed:

  • candle-binding/src/classifiers/lora/intent_lora.rs
  • candle-binding/src/core/tokenization.rs
  • candle-binding/src/ffi/classify.rs
  • candle-binding/src/ffi/init.rs
  • candle-binding/src/model_architectures/lora/bert_lora.rs

📁 config

Owners: @rootfs, @Xunzhuo
Files changed:

  • config/config.yaml

📁 deploy

Owners: @rootfs, @Xunzhuo
Files changed:

  • deploy/helm/semantic-router/values.yaml
  • deploy/kubernetes/aibrix/semantic-router-values/values.yaml

📁 e2e

Owners: @Xunzhuo
Files changed:

  • e2e/profiles/ai-gateway/values.yaml
  • e2e/profiles/dynamic-config/values.yaml

vLLM

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

@github-actions github-actions bot deleted a comment from codecov-commenter Nov 23, 2025
@github-actions github-actions bot deleted a comment from codecov-commenter Nov 24, 2025
@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 24, 2025

https://github.com/vllm-project/semantic-router/actions/runs/19620268281?pr=726 can you check the report, i saw the accuracy drops to 40%, any ideas?

@yossiovadia
Copy link
Collaborator Author

yossiovadia commented Nov 24, 2025

it seems that the LoRA intent classifier has 40.71% accuracy vs ModernBERT's ~~ 63.83% accuracy
hopefully we can retrain the Lora model, i'll investigate further tmrw

@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 24, 2025

@yossiovadia thanks, so i think we should hold this PR for a while util LoRA reaches almost the same accuracy with full param

@yossiovadia
Copy link
Collaborator Author

I think that lora_config.json wasn't downloaded from HF, probably there's .downloaded file prevents re-downloading( if cache exists, it skips the download of the model )
the lora_config is in HF ( uploaded ~2 weeks ago )
however , testing it locally, though result improved, it's now 70% ( vs higher against modernbert. )
still looking ( but we need to clear the cache regardless. )

@yossiovadia
Copy link
Collaborator Author

yes, dont approve / merge yet, let me see if we can get at least same result as moderbert.

@yossiovadia
Copy link
Collaborator Author

yossiovadia commented Nov 24, 2025

Hey @Xunzhuo -
looking further ( if lora_config.json exists ) the Lora is actually BETTER than the base ModernBERT (70% vs 64%)

If u can clean the .download and re-kick this CI test again it would be great ? ( I'll only get back tmrw for this, it's getting late here )

@yossiovadia yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch 2 times, most recently from 0ef04b1 to d4abd32 Compare November 24, 2025 14:51
@yossiovadia
Copy link
Collaborator Author

I'm back. testing the lora_config theory

@github-actions github-actions bot deleted a comment from codecov-commenter Nov 24, 2025
@yossiovadia
Copy link
Collaborator Author

seems it still not fetching the config_lora, digging in.

@yossiovadia
Copy link
Collaborator Author

Investigation Update: Accuracy Regression Confirmed

I've investigated the accuracy drop and can confirm there is a measurable regression when switching from ModernBERT to LoRA for intent classification.

Findings

Baseline (main branch with ModernBERT):

Current PR (with LoRA):

Analysis

  1. ~10% accuracy drop when switching from ModernBERT to LoRA (50% → 40%)
  2. Not caused by missing lora_config.json - verified the file is downloaded correctly and LoRA auto-detection works as intended
  3. The LoRA model itself performs worse on the MMLU-Pro academic question dataset used in E2E tests

Next Steps

Holding this PR until we understand why the LoRA model underperforms ModernBERT on this dataset. Possible areas to investigate:

  • Training data mismatch (LoRA training set vs. MMLU-Pro test set)
  • Model capacity differences
  • Hyperparameter tuning for LoRA
  • Category mapping alignment

cc @Xunzhuo - confirming your observation about the accuracy drop. The auto-detection mechanism works correctly; the issue is with the LoRA model's performance on this specific dataset.

@yossiovadia
Copy link
Collaborator Author

Note to self ( and others )
useful command to fetch the accuracy from specific run -

gh run view 19590181379 --repo vllm-project/semantic-router --log 2>&1 | grep -E "Accuracy Rate|Domain classification test completed" | head -5

yossiovadia added a commit to yossiovadia/semantic-router that referenced this pull request Nov 24, 2025
…kens

This commit fixes LoRA tokenization errors that occurred when processing
inputs exceeding 512 tokens, which caused "index-select invalid index 512
with dim size 512" errors and resulted in empty predictions.

Changes:
- Added explicit truncation configuration to BertLoRAClassifier tokenizer
- Added safety check in UnifiedTokenizer::tokenize_for_lora()
- Ensures all inputs are properly truncated to BERT's 512 token limit

Test results:
- LoRA accuracy improved from ~40% (with empty predictions) to 80.36%
- 0 tokenization errors on 280 MMLU-Pro test cases
- 0 empty predictions

Fixes the accuracy regression reported in vllm-project#726

Signed-off-by: Yossi Ovadia <[email protected]>
@github-actions github-actions bot deleted a comment from codecov-commenter Nov 24, 2025
@yossiovadia
Copy link
Collaborator Author

Hey @Xunzhuo and @rootfs ,

PR #709 ( mine...) enabled LoRA-based PII detection which has much higher sensitivity (90% success rate vs 27% with ModernBERT). While this fixed real PII detection issues (#647), it introduced a new problem: The LoRA PII model detects numbers in academic questions as PII:
"$1650" → B-ZIP_CODE (0.865 confidence)
"4111-1111-1111-1111" → B-US_SSN (0.867 confidence)
Various numbers → Phone numbers, credit cards, etc.

This seems to Impact on Domain Classification Tests:
LoRA intent classifier achieves ~80% accuracy (I have verified via direct testing)
But ~60% of domain test requests are blocked by PII false positives
Blocked requests don't return x-vsr-selected-category header
Test interprets missing headers as wrong classifications
Measured accuracy: 40.71% (should be ~80%)

few examples :

Business question: "A used car worth $1650 was purchased..." Detected: B-ZIP_CODE ('1650') with 0.865 confidence This is a price, not a ZIP code..
Credit card patterns in questions: "4111-1111-1111-1111" Detected as: B-US_SSN with 0.867 confidence 
Phone number patterns: Any sequence like "555-1234" Detected as: B-PHONE_NUMBER

im checking maybe a lower PII threshold now.

@yossiovadia yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch 2 times, most recently from 755804a to 83728cc Compare November 25, 2025 03:25
yossiovadia added a commit to yossiovadia/semantic-router that referenced this pull request Nov 25, 2025
…kens

This commit fixes LoRA tokenization errors that occurred when processing
inputs exceeding 512 tokens, which caused "index-select invalid index 512
with dim size 512" errors and resulted in empty predictions.

Changes:
- Added explicit truncation configuration to BertLoRAClassifier tokenizer
- Added safety check in UnifiedTokenizer::tokenize_for_lora()
- Ensures all inputs are properly truncated to BERT's 512 token limit

Test results:
- LoRA accuracy improved from ~40% (with empty predictions) to 80.36%
- 0 tokenization errors on 280 MMLU-Pro test cases
- 0 empty predictions

Fixes the accuracy regression reported in vllm-project#726

Signed-off-by: Yossi Ovadia <[email protected]>
@yossiovadia yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch from 800f9fa to 04c8c7a Compare November 25, 2025 03:28
…ect#724)

This commit implements automatic detection of LoRA (Low-Rank Adaptation)
models based on the presence of lora_config.json in the model directory.

Changes:
- Add LoRA auto-detection logic in Rust candle-binding layer
- Implement fallback to BERT base model when LoRA config is not found
- Add comprehensive test coverage for auto-detection mechanism
- Update default Helm values to use LoRA intent classification model
- Update ABrix deployment values to use LoRA models

The auto-detection mechanism checks for lora_config.json during model
initialization and automatically switches between LoRA and base BERT
models without requiring explicit configuration changes.

Signed-off-by: Yossi Ovadia <[email protected]>
This commit fixes two critical issues affecting classification accuracy:

1. Fixed IsCategoryEnabled() to check correct config field path:
   - Changed from c.Config.CategoryMappingPath (non-existent)
   - To c.Config.CategoryModel.CategoryMappingPath (correct)
   - This bug prevented LoRA classification from running in e2e tests

2. Optimized PII detection threshold from 0.7 to 0.9:
   - Reduces false positives from aggressive LoRA PII model (PR vllm-project#709)
   - Improves domain classification accuracy from 40.71% to 52.50%
   - Beats ModernBERT baseline of ~50%

Updated e2e test configurations to use LoRA models with optimized
thresholds across ai-gateway and dynamic-config profiles.

Signed-off-by: Yossi Ovadia <[email protected]>
Increment cache version from v15 to v16 to ensure CI downloads the
updated LoRA models that include lora_config.json files needed for
auto-detection.

Signed-off-by: Yossi Ovadia <[email protected]>
…olds

Update default configuration to use LoRA-based classification:
- Intent classification: lora_intent_classifier_bert-base-uncased_model
- PII detection: lora_pii_detector_bert-base-uncased_model with threshold 0.9

This aligns the default config with e2e test configurations for
consistency across all environments.

Signed-off-by: Yossi Ovadia <[email protected]>
@yossiovadia yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch from 04c8c7a to de3790a Compare November 25, 2025 03:34
@github-actions github-actions bot deleted a comment from codecov-commenter Nov 25, 2025
@Xunzhuo
Copy link
Member

Xunzhuo commented Nov 25, 2025

maybe we should separately test domain classification and pii/jailbreak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enable LoRA auto-detection for Intent/Category Classification

4 participants