Skip to content

Conversation

@yingxudeng
Copy link
Collaborator

@yingxudeng yingxudeng commented Nov 6, 2025

No description provided.

@yingxudeng
Copy link
Collaborator Author

yingxudeng commented Nov 6, 2025

Summary of Pending Items for this PR

This PR is a work in progress. The following items need to be completed before it's ready for final review:

  • Compile the new libtorch_npu.so and update the related Docker images.
  • Refactor the WordEmbedding class:
    • Rename the legacy WordEmbedding implementation to free up the class name.
    • Ensure the new WordEmbedding implementation can share unified logic with MLU and GPU backends.
  • Apply the same refactoring (as point 2) to the LMHead component.
  • Investigate and fix the widespread compilation failures in the test files.
  • Validate that the changes do not break existing model inference pipelines.
  • Review and update the default values for flags like ENABLE_NATIVE_NPU and USE_NPU_TORCH (modify or remove as needed).
  • Analyze and optimize #if preprocessor directives for potential consolidation.
  • Address other minor pending items and technical debt.

@liutongxuan liutongxuan changed the title feat: enable torch_npu graph mode for Qwen-3 dense with single and multi-card TP support. feat: enable torch_npu graph mode for Qwen-3 dense with single and multi-device TP support. Nov 6, 2025
@liutongxuan liutongxuan changed the title feat: enable torch_npu graph mode for Qwen-3 dense with single and multi-device TP support. feat: enable torch_npu graph mode for Qwen-3 dense with TP support. Nov 6, 2025
@yingxudeng yingxudeng force-pushed the feat/qwen3_npu_native_main branch from 4549991 to 1a5e2f0 Compare November 6, 2025 12:23

torch::Tensor active_tensor(ActivationParams& params) {
#if defined(USE_NPU)
return npu::active(params.input);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params.output = npu::active(params.input) and we need to choose act_mode


torch::Tensor fused_layernorm_tensor(FusedLayerNormParams& params) {
#if defined(USE_NPU)
return npu::fused_layernorm(params.input, params.weight, params.eps);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as active_tensor

// for npu
torch::Tensor seq_lens;
int num_heads;
int num_kv_heads;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can get num_heads and num_kv_heads form th shape of query and key.

namespace xllm {
namespace layer {

#if defined(USE_NPU)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need USE_NPU

hidden_size_ =
context_dim * static_cast<int>(std::pow(spatial_merge_size, 2));
ln_q_ = register_module("ln_q", layer::RmsNorm(context));
ln_q_ = register_module("ln_q", layer::NpuRmsNorm(context));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why rename RmsNorm to NpuRmsNorm?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants