Skip to content

fix moe_align1 kernel performance issue in prefill stage.#718

Merged
shihaobai merged 2 commits intomainfrom
wzj
Feb 10, 2025
Merged

fix moe_align1 kernel performance issue in prefill stage.#718
shihaobai merged 2 commits intomainfrom
wzj

Conversation

@hiworldwzj
Copy link
Collaborator

old version:
test groped fused moe speed.py 200
token num: 200 cost time:0.0011632442474365234 s
.256
token num: 256 cost time: 0.0011243820198429688 S
.8192
token num:8192 cost time: 0.05202174186706543 s
new version:
test groped fused moe speed.py 200
token num: 200 cost time:0.0011744499206542969 5
256
token num:256 cost time:0.0010919570922851562 s
.8192
token num: 8192 cost time: 0.003216266632080078 s

8192 token 10x faster.

@shihaobai shihaobai merged commit 743ddc3 into main Feb 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants