You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[main] remove torch.cat and replace it by List[0] (#2153)
### What this PR does / why we need it?
torch_npu.npu_grouped_matmul:
https://www.hiascend.com/document/detail/zh/Pytorch/710/apiref/torchnpuCustomsapi/context/torch_npu-npu_grouped_matmul.md
According to the document, when `split_item` is 2 or 3,
`torch_npu.npu_grouped_matmul` will return a list which has one element.
Therefore, the `torch.cat` after `torch_npu.npu_grouped_matmul` is
unnecessary.
### Does this PR introduce _any_ user-facing change?
not involved
### How was this patch tested?
ut and e2e covered: `tests/ut/ops/test_fused_ops.py`,
`tests/e2e/singlecard/ops/test_fused_moe.py`
**performance**:
(qwen3 30B, 2k->20k)
base:
Total Token throughput (tok/s): 667.76
remove cat:
Total Token throughput (tok/s): 680.82
- vLLM version: v0.10.0
- vLLM main:
vllm-project/vllm@fa00c5d
Signed-off-by: huangxialu <[email protected]>
0 commit comments