fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

OneZero-Y · 2025-10-17T09:21:57Z

What type of PR is this?

What's Changed

1. Simplified `parallel_engine.rs` Thread Handling

File: candle-binding/src/classifiers/lora/parallel_engine.rs

Before (Complex manual threading):

let intent_results = Arc::new(Mutex::new(Vec::new()));
let pii_results = Arc::new(Mutex::new(Vec::new()));
let security_results = Arc::new(Mutex::new(Vec::new()));

let handles = vec![
    self.spawn_intent_task(texts_owned.clone(), Arc::clone(&intent_results)),
    self.spawn_pii_task(texts_owned.clone(), Arc::clone(&pii_results)),
    self.spawn_security_task(texts_owned.clone(), Arc::clone(&security_results)),
];

for handle in handles {
    handle.join().unwrap();
}

After (Clean rayon parallelism):

use rayon::prelude::*;

let ((intent_results, pii_results), security_results) = rayon::join(
    || rayon::join(
        || self.intent_classifier.batch_classify(texts),
        || self.pii_classifier.batch_detect(texts),
    ),
    || self.security_classifier.batch_detect(texts),
);

2. Parallelized Batch Processing in LoRA Classifiers

Files:

candle-binding/src/classifiers/lora/pii_lora.rs
candle-binding/src/classifiers/lora/intent_lora.rs
candle-binding/src/classifiers/lora/security_lora.rs

Change (1 line per file):

// Before
texts.iter().map(|text| self.detect(text)).collect()

// After  
texts.par_iter().map(|text| self.detect(text)).collect()

3. Multi-Task Batch Classification Parallelization

File: candle-binding/src/model_architectures/lora/bert_lora.rs

Function: classify_batch_multi_task()

Before:

pub fn classify_batch_multi_task(&self, texts: &[&str]) -> Result<Vec<LoRAMultiTaskResult>> {
    // For now, process sequentially. In future, implement true batch processing
    texts.iter().map(|text| self.classify_multi_task(text)).collect()
}

After:

pub fn classify_batch_multi_task(&self, texts: &[&str]) -> Result<Vec<LoRAMultiTaskResult>> {
    texts.par_iter().map(|text| self.classify_multi_task(text)).collect()
}

4. Traditional Model Batch Forward Pass Parallelization

File: candle-binding/src/model_architectures/traditional/base_model.rs

Function: forward_batch()

Before:

pub fn forward_batch(&self, input_batch: &[Tensor], attention_batch: &[Tensor]) -> Result<Vec<Tensor>> {
    let mut results = Vec::with_capacity(input_batch.len());
    for (input_ids, attention_mask) in input_batch.iter().zip(attention_batch.iter()) {
        let output = self.forward(input_ids, attention_mask)?;
        results.push(output);
    }
    Ok(results)
}

After:

pub fn forward_batch(&self, input_batch: &[Tensor], attention_batch: &[Tensor]) -> Result<Vec<Tensor>> {
    input_batch
        .par_iter()
        .zip(attention_batch.par_iter())
        .map(|(input_ids, attention_mask)| self.forward(input_ids, attention_mask))
        .collect()
}

Testing

New Test

Added: candle-binding/src/classifiers/lora/parallel_engine_test.rs

Which issue(s) this PR fixes:

part of #266

Release Notes: Yes/No

Signed-off-by: OneZero-Y <[email protected]>

github-actions · 2025-10-17T09:22:08Z

👥 vLLM Semantic Team Notification

The following members have been identified for the changed files in this PR and have been automatically assigned:

📁 `candle-binding`

Owners: @rootfs
Files changed:

candle-binding/src/classifiers/lora/parallel_engine_test.rs
candle-binding/Cargo.toml
candle-binding/src/classifiers/lora/intent_lora.rs
candle-binding/src/classifiers/lora/mod.rs
candle-binding/src/classifiers/lora/parallel_engine.rs
candle-binding/src/classifiers/lora/pii_lora.rs
candle-binding/src/classifiers/lora/security_lora.rs
candle-binding/src/model_architectures/embedding/qwen3_embedding_test.rs
candle-binding/src/model_architectures/lora/bert_lora.rs
candle-binding/src/model_architectures/traditional/base_model.rs

🎉 Thanks for your contributions!

This comment was automatically generated based on the OWNER files in the repository.

rootfs · 2025-10-17T12:16:19Z

@OneZero-Y thanks! I'll test this branch soonish.

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers

caea0e9

Signed-off-by: OneZero-Y <[email protected]>

OneZero-Y requested review from Xunzhuo and rootfs as code owners October 17, 2025 09:21

github-actions bot assigned rootfs Oct 17, 2025

rootfs approved these changes Oct 17, 2025

View reviewed changes

rootfs merged commit e34204c into vllm-project:feat-candle-refactoring Oct 17, 2025
3 of 4 checks passed

OneZero-Y deleted the feat/testing-for-candle-refactoring branch October 18, 2025 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

Uh oh!

OneZero-Y commented Oct 17, 2025

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

rootfs commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

fix:Implement Comprehensive Rayon Parallelization for LoRA Classifiers #464

Uh oh!

Conversation

OneZero-Y commented Oct 17, 2025

What's Changed

1. Simplified parallel_engine.rs Thread Handling

2. Parallelized Batch Processing in LoRA Classifiers

3. Multi-Task Batch Classification Parallelization

4. Traditional Model Batch Forward Pass Parallelization

Testing

New Test

Uh oh!

github-actions bot commented Oct 17, 2025

👥 vLLM Semantic Team Notification

📁 candle-binding

🎉 Thanks for your contributions!

Uh oh!

rootfs commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. Simplified `parallel_engine.rs` Thread Handling

📁 `candle-binding`