feat: Add FP8 dtype support (E4M3FN and E5M2)#15
Conversation
Adds support for F8_E4M3 and F8_E5M2 dtypes in SafeTensors format,
enabling loading of fp8-quantized models from HuggingFace.
Changes:
- Add {:f, 8, :e4m3fn} → "F8_E4M3" mapping
- Add {:f, 8, :e5m2} → "F8_E5M2" mapping
- Add {:f, 8} → "F8_E5M2" for backward compatibility
- Update dtype_to_type reverse mappings for fp8 formats
Enables loading models like Qwen3-4B-Instruct-2507-FP8 which uses
F8_E4M3 format for weights with fine-grained quantization.
- Test write/read for E4M3FN and E5M2 tensors - Test type preservation in round-trip - Test lazy loading with fp8 types - Test byte size calculation - Test dtype strings in SafeTensors header - Add NX_PATH environment variable support for local development
|
Please remove the convo.txt :) My suggestion is to break this in two. The first one is to add FP8 support, which means E5M2. No need for additional tuples and steps. then a separate PR adds handling for unknown types. For now, the user should pass a separate function that receives the type and the value and builds the tensors which types QWEN uses? |
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Qwen3 is using: F8_E4M3 |
|
ok I will work to break this down, for bumblebee and Nx i will open the PRs as draft for open more the discussion. |
|
@nyo16 to make everyone on the same page: elixir-nx/nx#1657 (comment) I think this PR will be straight-forward once we add e4m3fn to Nx, no need for custom functions :) |
Update fp8 support to match Nx PR #1657 which uses 2-tuple types
({:f8_e4m3fn, 8}) instead of 3-tuple ({:f, 8, :e4m3fn}).
- Remove 3-tuple type mappings and overrides
- Simplify tensor_byte_size, tensor_to_iodata, and build_tensor
- Update tests to use :f8_e4m3fn shorthand and {:f, 8} for E5M2
Temporarily use Nx main branch which has fp8 support until next release.
|
@josevalim I updated this repo to reflects to the changes that we did in Nx. |
josevalim
left a comment
There was a problem hiding this comment.
Can you please rewrite the tests to be in the same style of the existing ones? Basically add a new read and a new write test for each type and that’s it. No need for pointless assertions or comments, just copy one of the formats and replace the type (and values). Thank you!
Rewrite fp8 tests as simple write/read tests for each type, matching the style of existing tests.
|
@grzuy, any objection? |
|
Just "Approved workflows to run" |
@josevalim Nope. |
|
Only pending green checks it seems. |
|
@josevalim do you want me to update the ci/cd to has min version 1.16? |
|
💚 💙 💜 💛 ❤️ |
Summary
Add FP8 (8-bit floating point) dtype support for E4M3FN and E5M2 formats. This enables reading and writing FP8 quantized model
weights from HuggingFace models like Qwen3-FP8.
Changes
F8_E4M3FN,F8_E5M2) and type mapping{:f, 8, :e4m3fn}and{:f, 8, :e5m2}in:dtype_from_string/1- Parse "F8_E4M3" from safetensor headerstensor_byte_size/1- Calculate byte size for FP8 tensorstensor_to_iodata/1- Serialize FP8 tensorsbuild_tensor/2- Deserialize FP8 tensorsTest plan
Notes
This is the first PR in a series to enable native FP8 model inference: