Commit cf8691b
committed
fix(lora): add explicit tokenizer truncation to handle inputs >512 tokens
This commit fixes LoRA tokenization errors that occurred when processing
inputs exceeding 512 tokens, which caused "index-select invalid index 512
with dim size 512" errors and resulted in empty predictions.
Changes:
- Added explicit truncation configuration to BertLoRAClassifier tokenizer
- Added safety check in UnifiedTokenizer::tokenize_for_lora()
- Ensures all inputs are properly truncated to BERT's 512 token limit
Test results:
- LoRA accuracy improved from ~40% (with empty predictions) to 80.36%
- 0 tokenization errors on 280 MMLU-Pro test cases
- 0 empty predictions
Fixes the accuracy regression reported in vllm-project#726
Signed-off-by: Yossi Ovadia <[email protected]>1 parent e62acbf commit cf8691b
File tree
2 files changed
+33
-3
lines changed- candle-binding/src
- core
- model_architectures/lora
2 files changed
+33
-3
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
387 | 387 | | |
388 | 388 | | |
389 | 389 | | |
390 | | - | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
391 | 403 | | |
392 | 404 | | |
393 | 405 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
499 | 499 | | |
500 | 500 | | |
501 | 501 | | |
502 | | - | |
| 502 | + | |
503 | 503 | | |
504 | 504 | | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
505 | 514 | | |
506 | 515 | | |
507 | 516 | | |
| |||
690 | 699 | | |
691 | 700 | | |
692 | 701 | | |
693 | | - | |
| 702 | + | |
694 | 703 | | |
695 | 704 | | |
| 705 | + | |
| 706 | + | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
696 | 714 | | |
697 | 715 | | |
698 | 716 | | |
| |||
0 commit comments