feat(classifier): enable LoRA auto-detection for intent classification #726

yossiovadia · 2025-11-23T19:05:12Z

Summary

Enable LoRA auto-detection for intent/category classification, following the pattern from PII detection (PR #709).

Fixes #724

Changes

Updated CategoryInitializer to auto-detect LoRA models via lora_config.json
Updated Rust FFI layer to support intent LoRA classification
Removed hardcoded useModernBERT logic - now auto-detects model type
Maintains backward compatibility with Traditional BERT and ModernBERT fallback
Net reduction: 31 additions, 40 deletions

Technical Details

Go Layer (classifier.go):

Replaced separate initializers with unified CategoryInitializerImpl
Auto-detection tries InitCandleBertClassifier() first (checks for lora_config.json)
Falls back to ModernBERT if LoRA initialization fails

Rust Layer (init.rs, classify.rs):

Added LORA_INTENT_CLASSIFIER static variable
Updated init_candle_bert_classifier with intelligent model type detection
Routes to IntentLoRAClassifier::new() for LoRA models
Proper fallback chain: LoRA → Traditional BERT → ModernBERT

Testing

Local testing with LoRA model (lora_intent_classifier_bert-base-uncased_model)
Auto-detection test passes (TestIntentClassificationLoRAAutoDetection)
E2E classification tests pass (03-classification-api-test.py)
Backward compatibility maintained
Pre-commit checks pass

Configuration

No configuration changes needed - auto-detection works with existing E2E config:
```yaml
classifier:
category_model:
model_id: "models/lora_intent_classifier_bert-base-uncased_model"
use_cpu: true
```

Based on PII LoRA auto-detection pattern from PR fix(647): enable LoRA PII auto-detection with minimal changes #709
Part of overall LoRA migration (Jailbreak to follow in separate PR for issue Enable LoRA auto-detection for Jailbreak Detection #725)
Requested by @Xunzhuo in PR fix(api): expose actual PII confidence scores instead of hardcoded 0.9 #718 comments

netlify · 2025-11-23T19:05:17Z

✅ Deploy Preview for vllm-semantic-router ready!

Name	Link
🔨 Latest commit	`de3790a`
🔍 Latest deploy log	https://app.netlify.com/projects/vllm-semantic-router/deploys/692523e0a1c48c00080b4cbc
😎 Deploy Preview	https://deploy-preview-726--vllm-semantic-router.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2025-11-23T19:05:25Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `src`

Owners: @rootfs, @Xunzhuo, @wangchen615
Files changed:

src/semantic-router/pkg/classification/lora_auto_detection_test.go
src/semantic-router/pkg/classification/classifier.go

📁 `Root Directory`

Owners: @rootfs, @Xunzhuo
Files changed:

.github/workflows/test-and-build.yml

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/src/classifiers/lora/intent_lora.rs
candle-binding/src/core/tokenization.rs
candle-binding/src/ffi/classify.rs
candle-binding/src/ffi/init.rs
candle-binding/src/model_architectures/lora/bert_lora.rs

📁 `config`

Owners: @rootfs, @Xunzhuo
Files changed:

config/config.yaml

📁 `deploy`

Owners: @rootfs, @Xunzhuo
Files changed:

deploy/helm/semantic-router/values.yaml
deploy/kubernetes/aibrix/semantic-router-values/values.yaml

📁 `e2e`

Owners: @Xunzhuo
Files changed:

e2e/profiles/ai-gateway/values.yaml
e2e/profiles/dynamic-config/values.yaml

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

Xunzhuo · 2025-11-24T03:01:32Z

https://github.com/vllm-project/semantic-router/actions/runs/19620268281?pr=726 can you check the report, i saw the accuracy drops to 40%, any ideas?

yossiovadia · 2025-11-24T03:28:21Z

it seems that the LoRA intent classifier has 40.71% accuracy vs ModernBERT's ~~ 63.83% accuracy
hopefully we can retrain the Lora model, i'll investigate further tmrw

Xunzhuo · 2025-11-24T03:45:00Z

@yossiovadia thanks, so i think we should hold this PR for a while util LoRA reaches almost the same accuracy with full param

yossiovadia · 2025-11-24T03:46:23Z

I think that lora_config.json wasn't downloaded from HF, probably there's .downloaded file prevents re-downloading( if cache exists, it skips the download of the model )
the lora_config is in HF ( uploaded ~2 weeks ago )
however , testing it locally, though result improved, it's now 70% ( vs higher against modernbert. )
still looking ( but we need to clear the cache regardless. )

yossiovadia · 2025-11-24T03:47:13Z

yes, dont approve / merge yet, let me see if we can get at least same result as moderbert.

yossiovadia · 2025-11-24T04:00:41Z

Hey @Xunzhuo -
looking further ( if lora_config.json exists ) the Lora is actually BETTER than the base ModernBERT (70% vs 64%)

If u can clean the .download and re-kick this CI test again it would be great ? ( I'll only get back tmrw for this, it's getting late here )

yossiovadia · 2025-11-24T14:53:16Z

I'm back. testing the lora_config theory

yossiovadia · 2025-11-24T17:01:43Z

seems it still not fetching the config_lora, digging in.

yossiovadia · 2025-11-24T17:36:50Z

Investigation Update: Accuracy Regression Confirmed

I've investigated the accuracy drop and can confirm there is a measurable regression when switching from ModernBERT to LoRA for intent classification.

Findings

Baseline (main branch with ModernBERT):

CI Run: https://github.com/vllm-project/semantic-router/actions/runs/19590181379
Commit: c3ce62e (Nov 22, 2025)
Accuracy: 50.36% (141/280 correct on MMLU-Pro dataset)

Current PR (with LoRA):

CI Run: https://github.com/vllm-project/semantic-router/actions/runs/19620268281
Commit: 4c9bc0c (Nov 24, 2025)
Accuracy: 40.71% (114/280 correct on MMLU-Pro dataset)

Analysis

~10% accuracy drop when switching from ModernBERT to LoRA (50% → 40%)
Not caused by missing lora_config.json - verified the file is downloaded correctly and LoRA auto-detection works as intended
The LoRA model itself performs worse on the MMLU-Pro academic question dataset used in E2E tests

Next Steps

Holding this PR until we understand why the LoRA model underperforms ModernBERT on this dataset. Possible areas to investigate:

Training data mismatch (LoRA training set vs. MMLU-Pro test set)
Model capacity differences
Hyperparameter tuning for LoRA
Category mapping alignment

cc @Xunzhuo - confirming your observation about the accuracy drop. The auto-detection mechanism works correctly; the issue is with the LoRA model's performance on this specific dataset.

yossiovadia · 2025-11-24T17:37:59Z

Note to self ( and others )
useful command to fetch the accuracy from specific run -

gh run view 19590181379 --repo vllm-project/semantic-router --log 2>&1 | grep -E "Accuracy Rate|Domain classification test completed" | head -5

…kens This commit fixes LoRA tokenization errors that occurred when processing inputs exceeding 512 tokens, which caused "index-select invalid index 512 with dim size 512" errors and resulted in empty predictions. Changes: - Added explicit truncation configuration to BertLoRAClassifier tokenizer - Added safety check in UnifiedTokenizer::tokenize_for_lora() - Ensures all inputs are properly truncated to BERT's 512 token limit Test results: - LoRA accuracy improved from ~40% (with empty predictions) to 80.36% - 0 tokenization errors on 280 MMLU-Pro test cases - 0 empty predictions Fixes the accuracy regression reported in vllm-project#726 Signed-off-by: Yossi Ovadia <[email protected]>

yossiovadia · 2025-11-25T03:10:58Z

Hey @Xunzhuo and @rootfs ,

PR #709 ( mine...) enabled LoRA-based PII detection which has much higher sensitivity (90% success rate vs 27% with ModernBERT). While this fixed real PII detection issues (#647), it introduced a new problem: The LoRA PII model detects numbers in academic questions as PII:
"$1650" → B-ZIP_CODE (0.865 confidence)
"4111-1111-1111-1111" → B-US_SSN (0.867 confidence)
Various numbers → Phone numbers, credit cards, etc.

This seems to Impact on Domain Classification Tests:
LoRA intent classifier achieves ~80% accuracy (I have verified via direct testing)
But ~60% of domain test requests are blocked by PII false positives
Blocked requests don't return x-vsr-selected-category header
Test interprets missing headers as wrong classifications
Measured accuracy: 40.71% (should be ~80%)

few examples :

Business question: "A used car worth $1650 was purchased..." Detected: B-ZIP_CODE ('1650') with 0.865 confidence This is a price, not a ZIP code..
Credit card patterns in questions: "4111-1111-1111-1111" Detected as: B-US_SSN with 0.867 confidence 
Phone number patterns: Any sequence like "555-1234" Detected as: B-PHONE_NUMBER

im checking maybe a lower PII threshold now.

…kens This commit fixes LoRA tokenization errors that occurred when processing inputs exceeding 512 tokens, which caused "index-select invalid index 512 with dim size 512" errors and resulted in empty predictions. Changes: - Added explicit truncation configuration to BertLoRAClassifier tokenizer - Added safety check in UnifiedTokenizer::tokenize_for_lora() - Ensures all inputs are properly truncated to BERT's 512 token limit Test results: - LoRA accuracy improved from ~40% (with empty predictions) to 80.36% - 0 tokenization errors on 280 MMLU-Pro test cases - 0 empty predictions Fixes the accuracy regression reported in vllm-project#726 Signed-off-by: Yossi Ovadia <[email protected]>

…ect#724) This commit implements automatic detection of LoRA (Low-Rank Adaptation) models based on the presence of lora_config.json in the model directory. Changes: - Add LoRA auto-detection logic in Rust candle-binding layer - Implement fallback to BERT base model when LoRA config is not found - Add comprehensive test coverage for auto-detection mechanism - Update default Helm values to use LoRA intent classification model - Update ABrix deployment values to use LoRA models The auto-detection mechanism checks for lora_config.json during model initialization and automatically switches between LoRA and base BERT models without requiring explicit configuration changes. Signed-off-by: Yossi Ovadia <[email protected]>

This commit fixes two critical issues affecting classification accuracy: 1. Fixed IsCategoryEnabled() to check correct config field path: - Changed from c.Config.CategoryMappingPath (non-existent) - To c.Config.CategoryModel.CategoryMappingPath (correct) - This bug prevented LoRA classification from running in e2e tests 2. Optimized PII detection threshold from 0.7 to 0.9: - Reduces false positives from aggressive LoRA PII model (PR vllm-project#709) - Improves domain classification accuracy from 40.71% to 52.50% - Beats ModernBERT baseline of ~50% Updated e2e test configurations to use LoRA models with optimized thresholds across ai-gateway and dynamic-config profiles. Signed-off-by: Yossi Ovadia <[email protected]>

Increment cache version from v15 to v16 to ensure CI downloads the updated LoRA models that include lora_config.json files needed for auto-detection. Signed-off-by: Yossi Ovadia <[email protected]>

…olds Update default configuration to use LoRA-based classification: - Intent classification: lora_intent_classifier_bert-base-uncased_model - PII detection: lora_pii_detector_bert-base-uncased_model with threshold 0.9 This aligns the default config with e2e test configurations for consistency across all environments. Signed-off-by: Yossi Ovadia <[email protected]>

Xunzhuo · 2025-11-25T05:52:23Z

maybe we should separately test domain classification and pii/jailbreak

yossiovadia requested review from Xunzhuo, rootfs and wangchen615 as code owners November 23, 2025 19:05

github-actions bot assigned rootfs, wangchen615 and Xunzhuo Nov 23, 2025

github-actions bot deleted a comment from codecov-commenter Nov 23, 2025

github-actions bot deleted a comment from codecov-commenter Nov 24, 2025

yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch 2 times, most recently from 0ef04b1 to d4abd32 Compare November 24, 2025 14:51

github-actions bot deleted a comment from codecov-commenter Nov 24, 2025

yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch 2 times, most recently from 755804a to 83728cc Compare November 25, 2025 03:25

yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch from 800f9fa to 04c8c7a Compare November 25, 2025 03:28

yossiovadia added 4 commits November 24, 2025 19:34

fix(ci): bump model cache version to pick up lora_config.json

52ec739

Increment cache version from v15 to v16 to ensure CI downloads the updated LoRA models that include lora_config.json files needed for auto-detection. Signed-off-by: Yossi Ovadia <[email protected]>

yossiovadia force-pushed the feat/intent-classification-lora-auto-detection branch from 04c8c7a to de3790a Compare November 25, 2025 03:34

github-actions bot deleted a comment from codecov-commenter Nov 25, 2025

feat(classifier): enable LoRA auto-detection for intent classification #726

Are you sure you want to change the base?

feat(classifier): enable LoRA auto-detection for intent classification #726

Uh oh!

Conversation

yossiovadia commented Nov 23, 2025

Summary

Changes

Technical Details

Testing

Configuration

Related

Uh oh!

netlify bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vllm-semantic-router ready!

Uh oh!

github-actions bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

👥 vLLM Semantic Team Notification

📁 src

📁 Root Directory

📁 candle-binding

📁 config

📁 deploy

📁 e2e

🎉 Thanks for your contributions!

Uh oh!

Xunzhuo commented Nov 24, 2025

Uh oh!

yossiovadia commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Xunzhuo commented Nov 24, 2025

Uh oh!

yossiovadia commented Nov 24, 2025

Uh oh!

yossiovadia commented Nov 24, 2025

Uh oh!

yossiovadia commented Nov 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yossiovadia commented Nov 24, 2025

Uh oh!

yossiovadia commented Nov 24, 2025

Uh oh!

yossiovadia commented Nov 24, 2025

Investigation Update: Accuracy Regression Confirmed

Findings

Analysis

Next Steps

Uh oh!

yossiovadia commented Nov 24, 2025

Uh oh!

yossiovadia commented Nov 25, 2025

Uh oh!

Xunzhuo commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

netlify bot commented Nov 23, 2025 •

edited

Loading

github-actions bot commented Nov 23, 2025 •

edited

Loading

📁 `src`

📁 `Root Directory`

📁 `candle-binding`

📁 `config`

📁 `deploy`

📁 `e2e`

yossiovadia commented Nov 24, 2025 •

edited

Loading

yossiovadia commented Nov 24, 2025 •

edited

Loading