-
Notifications
You must be signed in to change notification settings - Fork 77
feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: enable torch_npu graph mode for Qwen-3 dense with TP support. #325
Conversation
Summary of Pending Items for this PRThis PR is a work in progress. The following items need to be completed before it's ready for final review:
|
4549991 to
1a5e2f0
Compare
|
|
||
| torch::Tensor active_tensor(ActivationParams& params) { | ||
| #if defined(USE_NPU) | ||
| return npu::active(params.input); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
params.output = npu::active(params.input) and we need to choose act_mode
|
|
||
| torch::Tensor fused_layernorm_tensor(FusedLayerNormParams& params) { | ||
| #if defined(USE_NPU) | ||
| return npu::fused_layernorm(params.input, params.weight, params.eps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as active_tensor
| // for npu | ||
| torch::Tensor seq_lens; | ||
| int num_heads; | ||
| int num_kv_heads; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can get num_heads and num_kv_heads form th shape of query and key.
| namespace xllm { | ||
| namespace layer { | ||
|
|
||
| #if defined(USE_NPU) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need USE_NPU
| hidden_size_ = | ||
| context_dim * static_cast<int>(std::pow(spatial_merge_size, 2)); | ||
| ln_q_ = register_module("ln_q", layer::RmsNorm(context)); | ||
| ln_q_ = register_module("ln_q", layer::NpuRmsNorm(context)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why rename RmsNorm to NpuRmsNorm?
No description provided.