Skip to content

Conversation

yiakwy-xpu-ml-framework-team
Copy link

@yiakwy-xpu-ml-framework-team yiakwy-xpu-ml-framework-team commented Aug 10, 2025

Add bf16 SFT to mxfp4 conversion

Currently the model (gpt-oss 120b) can run in either bf16 (group-wise fp8 is also possible) data type in H800/H100 or mxfp4 data type in Blackwell.

After sft the GPT-OSS model and injecting the new identity, we need to convert the model back to MXFP4 to reduce model size when loading weights from HBM with 4-bit IO (in H800, the model will be converted back to bf16 in runtime).

Verification of correctness

We checked the model end-to-end and compared fp4 weight values par-to-par:

bf16_to_mxfp4

@dkundel-openai
Copy link
Collaborator

Hey @yiakwy-xpu-ml-framework-team thanks for your contribution. We'd recommend putting this script alongside the tool you are using for SFT. Will close this PR but thank you for your contribution

@yiakwy-xpu-ml-framework-team
Copy link
Author

Hi @dkundel-openai thanks for feedback ? I didn't use any other tools. Maybe I should add it to huggingface ?

I just think it is nature to be supported by OpenAI tiself to convert bf16 back to mxfp4. Feel free to open it if anyone want it back again.

@liuqianchao
Copy link

liuqianchao commented Aug 14, 2025

@yiakwy-xpu-ml-framework-team After using the script to get the mxfp4 weights, I got an error message when using the verification code. Do you have any idea?

RuntimeError: Error(s) in loading state_dict for Linear: size mismatch for weight: copying a param with shape torch.Size([201088, 2880]) from checkpoint, the shape in current model is torch.Size([50267, 1024]).

@yiakwy-xpu-ml-framework-team
Copy link
Author

yiakwy-xpu-ml-framework-team commented Aug 18, 2025

@liuqianchao we don't have the issue. The script should work with gpt-oss bf16 120b.

Have you verified original model and produced inverse weights map file for mxfp4 weights types ? Could you verify the name of the tensor ? Pay attention to the name, huggingface has converted original gpt-oss to new names where, "blocks", "scales" of moe ffn (up/down projections) are renamed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants