Skip to content

Conversation

Bruce-x-1997
Copy link

@Bruce-x-1997 Bruce-x-1997 commented Sep 18, 2025

What does this PR do?

support w4afp8 quant in v3.1(ue8m0)

Usage

just set gemm_impl to fp8 and use config_v3.1.json

Testing

after applying this patch, our model(3.1 using w4afp8) could reach aime25 50% and aime24 60%

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes/No
  • Did you write any new necessary tests?: Yes/No
  • Did you add or update any necessary documentation?: Yes/No
  • Did you update Changelog?: Yes/No

Additional Information

@Bruce-x-1997 Bruce-x-1997 requested a review from a team as a code owner September 18, 2025 12:25
Copy link

copy-pr-bot bot commented Sep 18, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@Bruce-x-1997
Copy link
Author

@cjluo-nv please help review it, thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant