[Refactor] optimize sample_recover method in reject_sampler by lio1226 · Pull Request #3727 · vllm-project/vllm-ascend

lio1226 · 2025-10-24T11:40:28Z

What this PR does / why we need it?

We optimized the sample_recovered_tokens_pytorch method reject_sampler and improve the performance of eagle-3.

Does this PR introduce any user-facing change?

How was this patch tested?

None

Co-authored-by: QilaiZhang (245706640@qq.com )

vLLM version: v0.11.0rc3
vLLM main: https://github.com/vllm-project/vllm/commit/v0.11.0

vLLM version: v0.11.0rc3
vLLM main: vllm-project/vllm@17c540a

github-actions · 2025-10-24T11:40:42Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

The pull request optimizes the sample_recovered_tokens_pytorch method in rejection_sampler.py to improve the performance of eagle-3. The optimization replaces the nested loops with vectorized operations using torch functions, which should reduce the execution time. I have identified a potential issue related to the indexing of q_values.

gemini-code-assist · 2025-10-24T11:41:44Z

vllm_ascend/sample/rejection_sampler.py

-
-            recovered_id = torch.argmax(prob / q_values).item()
-            output_token_ids[token_idx] = recovered_id
+    q_values[:vocab_size] = q_value_new[token_positions, :vocab_size]


The indexing q_values[:vocab_size] might lead to incorrect behavior. q_values is initialized with the shape (num_tokens, vocab_size), and q_value_new has the shape (num_tokens, vocab_size). Therefore, assigning q_value_new[token_positions, :vocab_size] to q_values[:vocab_size] will result in q_values having only the first vocab_size rows updated, while the rest of the rows will remain -inf. This is likely not the intended behavior, as it will skew the probability distribution for tokens beyond the first vocab_size positions. Consider assigning q_value_new to q_values directly.

To fix this, you should assign the entire q_value_new to q_values without slicing. This ensures that all token positions have the correct q-values for the subsequent argmax operation.

Severity: critical

Suggested change

q_values[:vocab_size] = q_value_new[token_positions, :vocab_size]

q_values = q_value_new

Signed-off-by: lio <1983142975@qq.com>

github-actions · 2025-12-29T06:12:52Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

wangxiyuan · 2026-01-05T03:25:28Z

Any progress? If this PR is still alive, please rebase to main and make CI happy. Thanks

lio1226 · 2026-01-06T00:09:53Z

Any progress? If this PR is still alive, please rebase to main and make CI happy. Thanks

The main branch has now completed the work of this pull request, so this pr can be closed.

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

[Refactor] optimize sample_recovered_tokens method in rejection_sampler

f128fd5

Signed-off-by: lio <1983142975@qq.com>

lio1226 force-pushed the rejection_sample_optimize_v1 branch from c3edfda to f128fd5 Compare October 24, 2025 12:04

lio1226 added 3 commits October 24, 2025 20:21

[Refactor] optimize sample_recover method in reject_sampler

161f1e2

Signed-off-by: lio <1983142975@qq.com>

[Refactor] optimize sample_recover method in reject_sampler

1eb87a0

Signed-off-by: lio <1983142975@qq.com>

[Refactor] optimize sample_recover method in reject_sampler

120cdd1

Signed-off-by: lio <1983142975@qq.com>

github-actions bot added the merge-conflicts label Dec 29, 2025

lio1226 closed this Jan 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Refactor] optimize sample_recover method in reject_sampler#3727

[Refactor] optimize sample_recover method in reject_sampler#3727
lio1226 wants to merge 4 commits intovllm-project:mainfrom
lio1226:rejection_sample_optimize_v1

lio1226 commented Oct 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

lio1226 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	q_values[:vocab_size] = q_value_new[token_positions, :vocab_size]
	q_values = q_value_new

Conversation

lio1226 commented Oct 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 29, 2025

Uh oh!

wangxiyuan commented Jan 5, 2026

Uh oh!

lio1226 commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lio1226 commented Oct 24, 2025 •

edited by github-actions bot

Loading