test vllm-ascend triton ops #4418

iKunHvv · 2025-11-25T03:36:02Z

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

iKunHvv · 2025-11-25T03:36:56Z

ok

gemini-code-assist

Code Review

This pull request introduces several Triton kernels to optimize the rejection sampling process in vllm-ascend, replacing existing PyTorch implementations. The changes aim to improve performance by leveraging GPU-specific optimizations. The PR also includes refactoring of helper functions for better vectorization and clarity. My review identified a critical bug in the new rejection_greedy_sample_kernel where a condition to filter non-greedy requests is incorrect, leading to wrong behavior. I also found a high-severity issue in the refactored sample_recovered_tokens_pytorch function, where an incorrect tensor slicing could cause runtime errors. I've provided suggestions to fix both issues.

gemini-code-assist · 2025-11-25T03:38:53Z

vllm_ascend/sample/rejection_sampler.py

+    if is_greedy is None:
+        # Early exit for non-greedy sampling requests.
+        return


The condition is_greedy is None is incorrect for a Triton kernel. The is_greedy variable, being a Triton boolean scalar (tl.int1), can never be Python's None. This bug prevents the intended early exit for non-greedy sampling requests, causing them to be processed incorrectly by this kernel designed for greedy sampling. The check should be if not is_greedy: to correctly handle non-greedy cases.

Suggested change

if is_greedy is None:

# Early exit for non-greedy sampling requests.

return

if not is_greedy:

# Early exit for non-greedy sampling requests.

return

gemini-code-assist · 2025-11-25T03:38:54Z

vllm_ascend/sample/rejection_sampler.py

+    q_values = torch.full((num_tokens, vocab_size),
                                  float('-inf'),
                                  device=q.device)
-            q_values[:vocab_size] = q[req_idx, :vocab_size]
-
-            recovered_id = torch.argmax(prob / q_values).item()
-            output_token_ids[token_idx] = recovered_id
+    q_values[:vocab_size] = q_value_new[token_positions, :vocab_size]


The current initialization and population of q_values is incorrect and overly complex. The row-wise slice q_values[:vocab_size] can cause a shape mismatch error if num_tokens > vocab_size. This block can be simplified by directly assigning q_value_new to q_values, which also removes the redundant torch.full initialization.

q_values = q_value_new

github-actions · 2025-11-25T03:58:23Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

test vllm-ascend triton ops

23fb520

gemini-code-assist bot reviewed Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test vllm-ascend triton ops #4418

test vllm-ascend triton ops #4418

Uh oh!

iKunHvv commented Nov 25, 2025 •

edited by github-actions bot

Loading

Uh oh!

iKunHvv commented Nov 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

gemini-code-assist bot Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

test vllm-ascend triton ops #4418

Are you sure you want to change the base?

test vllm-ascend triton ops #4418

Uh oh!

Conversation

iKunHvv commented Nov 25, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

iKunHvv commented Nov 25, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

iKunHvv commented Nov 25, 2025 •

edited by github-actions bot

Loading