[Kernel] add triton kernels for sampling #4394

swy20190 · 2025-11-24T09:01:29Z

What this PR does / why we need it?

Replace pyorch implement of sampling with triton kernels

Does this PR introduce any user-facing change?

No

How was this patch tested?

vLLM version: v0.11.0
vLLM main: vllm-project/vllm@2918c1b

github-actions · 2025-11-24T09:01:37Z

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

A PR should do only one thing, smaller PRs enable faster reviews.
Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

gemini-code-assist

Code Review

This pull request replaces several PyTorch-based sampling functions with Triton kernels, aiming to improve performance. The new kernels for expand_kernel and sample_recovered_tokens_kernel are well-implemented. However, rejection_greedy_sample_kernel and rejection_random_sample_kernel contain sequential loops over draft tokens. This is a performance anti-pattern in Triton that prevents vectorization and may not yield the desired speed-up. I've added specific comments with suggestions to vectorize these kernels for better efficiency.

vllm_ascend/sample/rejection_sampler.py

github-actions · 2025-11-24T09:46:10Z

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Signed-off-by: MidnightSun <[email protected]>

whx-sjtu · 2025-11-25T03:16:42Z

vllm_ascend/sample/rejection_sampler.py

-                max_spec_len,
-                is_greedy,
-            )
+        rejection_greedy_sample_kernel[(batch_size,)](


After communicating with @wangxiyuan, you need to use HAS_TRITON check here. If user hasn't installed triton, fall back to original implementation to ensure functionality. An example is here:

vllm-ascend/vllm_ascend/attention/sfa_v1.py

Line 505 in a3225c4

if HAS_TRITON:

whx-sjtu · 2025-11-25T03:18:38Z

vllm_ascend/sample/rejection_sampler.py

+else:
+    from vllm.v1.sample.rejection_sampler import apply_sampling_constraints
+
+import triton.language as tl


Use from vllm.triton_utils import tl, triton here, same reason.

whx-sjtu · 2025-11-25T03:21:20Z

In conclusion, currently we need to make sure that original functionality won't be broken by new triton optimization in environments without triton.

add triton kernels for sampling

64b7c8a

gemini-code-assist bot reviewed Nov 24, 2025

View reviewed changes

vllm_ascend/sample/rejection_sampler.py Show resolved Hide resolved

vllm_ascend/sample/rejection_sampler.py Show resolved Hide resolved

add triton import

3ffedd2

github-actions bot added the merge-conflicts label Nov 24, 2025

Merge branch 'main' into dev_suiweiyi_rej_kernels

c6b204f

Signed-off-by: MidnightSun <[email protected]>

github-actions bot removed the merge-conflicts label Nov 24, 2025

whx-sjtu approved these changes Nov 25, 2025

View reviewed changes

whx-sjtu suggested changes Nov 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Kernel] add triton kernels for sampling #4394

[Kernel] add triton kernels for sampling #4394

Uh oh!

swy20190 commented Nov 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

whx-sjtu Nov 25, 2025

Uh oh!

whx-sjtu Nov 25, 2025

Uh oh!

whx-sjtu commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Kernel] add triton kernels for sampling #4394

Are you sure you want to change the base?

[Kernel] add triton kernels for sampling #4394

Uh oh!

Conversation

swy20190 commented Nov 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does / why we need it?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 24, 2025

Uh oh!

whx-sjtu Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu Nov 25, 2025

Choose a reason for hiding this comment

Uh oh!

whx-sjtu commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

swy20190 commented Nov 24, 2025 •

edited by github-actions bot

Loading