[Refactor] Add global helper to deduplicate vectorized memory ops by LopezCastroRoberto · Pull Request #35105 · vllm-project/vllm

LopezCastroRoberto · 2026-02-23T14:50:44Z

PR #35210 has to be merged first

Summary

Extract duplicated 256-bit PTX load/store primitives, type converters, and vector traits into a shared csrc/cuda_vec_utils.cuh header.
Add compile-time CUDA_VERSION >= 12090 guard to activation kernel (csrc/activation_kernels.cu) launch macros so the 256-bit path is only selected when the toolkit supports it.
Remove dead code (PackedTraits, wrapper functions).

New PRs that introduce 256-bit instructions (e.g., #32957, #34917) should use the cuda_vec_utils.cuh helper to prevent code duplication and improve long-term maintainability.

Tested on SM120+cuda12.8 and SM103+cuda13.0

tests/kernels/quantization/test_nvfp4_quant.py -- ALL TEST PASSED
tests/kernels/quantization/test_silu_mul_nvfp4_quant.py -- ALL TEST PASSED
tests/kernels/core/test_activation.py -- ALL TEST PASSED

No performance regressions detected

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

mergify · 2026-02-23T14:51:42Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @LopezCastroRoberto.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request is a great refactoring that centralizes vectorized memory operations into a new csrc/cuda_vec_utils.cuh header. This significantly reduces code duplication and improves maintainability across various CUDA kernels. The new utilities are well-designed and make the code more readable and generic. I have one suggestion to further improve the robustness of the new TypeConverter utility by ensuring it fails at compile-time for unspecialized types.

gemini-code-assist · 2026-02-23T14:53:40Z

csrc/cuda_vec_utils.cuh

+template <typename T>
+struct TypeConverter {
+  using Type = half2;
+};


The default implementation for TypeConverter is risky as it silently defaults to half2 for any type T that doesn't have an explicit specialization. This could lead to subtle and hard-to-debug errors if a new scalar type is used with a kernel that relies on TypeConverter and a specialization is forgotten. It would be safer to enforce specialization by triggering a compile-time error for unhandled types.

template <typename T> struct TypeConverter { // This template must be specialized for each type. static_assert(sizeof(T) == 0, "TypeConverter is not specialized for this type."); };

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>

add helper vec instructions

d8b0b12

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

LopezCastroRoberto marked this pull request as draft February 23, 2026 14:50

mergify bot added the nvidia label Feb 23, 2026

mergify bot added the needs-rebase label Feb 23, 2026

github-project-automation bot added this to NVIDIA Feb 23, 2026

gemini-code-assist bot reviewed Feb 23, 2026

View reviewed changes

pre-commit failures

426ca89

Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>

LopezCastroRoberto mentioned this pull request Feb 23, 2026

[Perf][WIP] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 #34917

Open

refactor and bugfix

7d55bb8

Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>

LopezCastroRoberto marked this pull request as ready for review February 24, 2026 20:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

[Refactor] Add global helper to deduplicate vectorized memory ops#35105

[Refactor] Add global helper to deduplicate vectorized memory ops#35105
LopezCastroRoberto wants to merge 3 commits intovllm-project:mainfrom
LopezCastroRoberto:refactor/256b_instr

LopezCastroRoberto commented Feb 23, 2026 •

edited by github-actions bot

Loading

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Comments

Conversation

LopezCastroRoberto commented Feb 23, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

mergify bot commented Feb 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

LopezCastroRoberto commented Feb 23, 2026 •

edited by github-actions bot

Loading