feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325

yingxudeng · 2025-11-06T07:09:02Z

No description provided.

yingxudeng · 2025-11-06T07:17:16Z

Summary of Pending Items for this PR

This PR is a work in progress. The following items need to be completed before it's ready for final review:

Compile the new libtorch_npu.so and update the related Docker images.
Refactor the WordEmbedding class:
- Rename the legacy WordEmbedding implementation to free up the class name.
- Ensure the new WordEmbedding implementation can share unified logic with MLU and GPU backends.
Apply the same refactoring (as point 2) to the LMHead component.
Investigate and fix the widespread compilation failures in the test files.
Validate that the changes do not break existing model inference pipelines.
Review and update the default values for flags like ENABLE_NATIVE_NPU and USE_NPU_TORCH (modify or remove as needed).
Analyze and optimize #if preprocessor directives for potential consolidation.
Address other minor pending items and technical debt.

XuZhang99 · 2025-11-06T14:12:03Z

xllm/core/kernels/ops_api.cpp


+torch::Tensor active_tensor(ActivationParams& params) {
+#if defined(USE_NPU)
+  return npu::active(params.input);


params.output = npu::active(params.input) and we need to choose act_mode

XuZhang99 · 2025-11-06T14:12:58Z

xllm/core/kernels/ops_api.cpp


+torch::Tensor fused_layernorm_tensor(FusedLayerNormParams& params) {
+#if defined(USE_NPU)
+  return npu::fused_layernorm(params.input, params.weight, params.eps);


same as active_tensor

XuZhang99 · 2025-11-06T14:14:05Z

xllm/core/kernels/param.h

+  // for npu
+  torch::Tensor seq_lens;
+  int num_heads;
+  int num_kv_heads;


we can get num_heads and num_kv_heads form th shape of query and key.

XuZhang99 · 2025-11-06T14:15:08Z

xllm/core/layers/common/attention.cpp

 namespace xllm {
 namespace layer {

+#if defined(USE_NPU)


no need USE_NPU

XuZhang99 · 2025-11-06T14:16:32Z

xllm/models/vlm/qwen2_5_vl.h

    hidden_size_ =
        context_dim * static_cast<int>(std::pow(spatial_merge_size, 2));
-    ln_q_ = register_module("ln_q", layer::RmsNorm(context));
+    ln_q_ = register_module("ln_q", layer::NpuRmsNorm(context));


why rename RmsNorm to NpuRmsNorm?

yingxudeng requested review from liutongxuan and yq33victor November 6, 2025 07:09

liutongxuan changed the title ~~feat: enable torch_npu graph mode for Qwen-3 dense with single and multi-card TP support.~~ feat: enable torch_npu graph mode for Qwen-3 dense with single and multi-device TP support. Nov 6, 2025

liutongxuan changed the title ~~feat: enable torch_npu graph mode for Qwen-3 dense with single and multi-device TP support.~~ feat: enable torch_npu graph mode for Qwen-3 dense with TP support. Nov 6, 2025

yingxudeng added 3 commits November 6, 2025 20:22

feat: enable torch_npu graph mode for Qwen-3 dense with TP support.

9496815

bugfix: resolve gtest compilation failures.

5911088

bugfix: resolve compilation issues when building without NPU TORCH.

1a5e2f0

yingxudeng force-pushed the feat/qwen3_npu_native_main branch from 4549991 to 1a5e2f0 Compare November 6, 2025 12:23

XuZhang99 reviewed Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325

feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325

yingxudeng commented Nov 6, 2025 •

edited

Loading

Uh oh!

yingxudeng commented Nov 6, 2025 •

edited

Loading

Uh oh!

XuZhang99 Nov 6, 2025

Uh oh!

XuZhang99 Nov 6, 2025

Uh oh!

XuZhang99 Nov 6, 2025

Uh oh!

XuZhang99 Nov 6, 2025

Uh oh!

XuZhang99 Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325

Are you sure you want to change the base?

feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325

Conversation

yingxudeng commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yingxudeng commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of Pending Items for this PR

Uh oh!

XuZhang99 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

XuZhang99 Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yingxudeng commented Nov 6, 2025 •

edited

Loading

yingxudeng commented Nov 6, 2025 •

edited

Loading