-
-
Notifications
You must be signed in to change notification settings - Fork 9.8k
Update Flashinfer to 0.2.14.post1 #23537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: siyuanf <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]>
Signed-off-by: Weiliang Liu <[email protected]>
Signed-off-by: Weiliang Liu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates Flashinfer to version 0.2.14.post1, which addresses a performance issue in the allreduce fusion kernel and incorporates API changes. The changes also include enabling FlashInfer autotuning before CUDA graph capture for better performance. My review found a critical typo in a variable name (max_captute_size
instead of max_capture_size
) in mxfp4.py
which would lead to a runtime error. I've provided suggestions to fix this.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Michael Goin <[email protected]>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Michael Goin <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work!
Could you also report vllm bench
metric results so that we can see if we have some improvement for E2E throughput?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for putting everything together. Let's see the CI
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: tc-mb <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: tc-mb <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Xiao Yu <[email protected]>
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Siyuan Fu <[email protected]> Signed-off-by: siyuanf <[email protected]> Signed-off-by: Weiliang Liu <[email protected]> Signed-off-by: Michael Goin <[email protected]> Co-authored-by: Siyuan Fu <[email protected]> Co-authored-by: Michael Goin <[email protected]> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Purpose
Update flashinfer to
Test Plan
lm_eval on llama3
gpt-oss/eval test on gpt-oss
python3 -m gpt_oss.evals --sampler chat_completions --model gpt-oss-120b --reasoning-effort low,medium --n-threads 512 --eval gpqa
Test Result
llama3 FP8 tp2
llama3 FP4 tp1
gpt-oss tp1:
[{'eval_name': 'gpqa', 'model_name': 'gpt-oss-120b-low_temp1.0_20250825_021150', 'metric': 0.6414141414141414}, {'eval_name': 'gpqa', 'model_name': 'gpt-oss-120b-medium_temp1.0_20250825_021150', 'metric': 0.711489898989899}]
(Optional) Documentation Update
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.