Skip to content

feat: Add FP8 dtype support (E4M3FN and E5M2)#15

Merged
josevalim merged 11 commits intoelixir-nx:mainfrom
nyo16:fp8-support
Jan 28, 2026
Merged

feat: Add FP8 dtype support (E4M3FN and E5M2)#15
josevalim merged 11 commits intoelixir-nx:mainfrom
nyo16:fp8-support

Conversation

@nyo16
Copy link
Contributor

@nyo16 nyo16 commented Jan 8, 2026

Summary

Add FP8 (8-bit floating point) dtype support for E4M3FN and E5M2 formats. This enables reading and writing FP8 quantized model
weights from HuggingFace models like Qwen3-FP8.

Changes

  • Add FP8 dtype constants (F8_E4M3FN, F8_E5M2) and type mapping
  • Handle 3-tuple FP8 types {:f, 8, :e4m3fn} and {:f, 8, :e5m2} in:
    • dtype_from_string/1 - Parse "F8_E4M3" from safetensor headers
    • tensor_byte_size/1 - Calculate byte size for FP8 tensors
    • tensor_to_iodata/1 - Serialize FP8 tensors
    • build_tensor/2 - Deserialize FP8 tensors
  • Support reading FP8 model files (e.g., Qwen/Qwen3-0.6B-FP8)

Test plan

  • Unit tests for FP8 type encoding/decoding
  • Integration test reading real FP8 model files
  • Verified with Qwen3-0.6B-FP8 model inference

Notes

This is the first PR in a series to enable native FP8 model inference:

  1. safetensors (this PR) - FP8 file I/O
  2. nx/exla - FP8 type system support
  3. bumblebee - FP8 model loading and inference

nyo16 added 5 commits January 6, 2026 00:13
Adds support for F8_E4M3 and F8_E5M2 dtypes in SafeTensors format,
enabling loading of fp8-quantized models from HuggingFace.

Changes:
- Add {:f, 8, :e4m3fn} → "F8_E4M3" mapping
- Add {:f, 8, :e5m2} → "F8_E5M2" mapping
- Add {:f, 8} → "F8_E5M2" for backward compatibility
- Update dtype_to_type reverse mappings for fp8 formats

Enables loading models like Qwen3-4B-Instruct-2507-FP8 which uses
F8_E4M3 format for weights with fine-grained quantization.
- Test write/read for E4M3FN and E5M2 tensors
- Test type preservation in round-trip
- Test lazy loading with fp8 types
- Test byte size calculation
- Test dtype strings in SafeTensors header
- Add NX_PATH environment variable support for local development
@josevalim
Copy link
Contributor

Please remove the convo.txt :)

My suggestion is to break this in two. The first one is to add FP8 support, which means E5M2. No need for additional tuples and steps.

then a separate PR adds handling for unknown types. For now, the user should pass a separate function that receives the type and the value and builds the tensors

which types QWEN uses?

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@nyo16
Copy link
Contributor Author

nyo16 commented Jan 8, 2026

Qwen3 is using: F8_E4M3

@nyo16
Copy link
Contributor Author

nyo16 commented Jan 8, 2026

ok I will work to break this down, for bumblebee and Nx i will open the PRs as draft for open more the discussion.

@josevalim
Copy link
Contributor

josevalim commented Jan 8, 2026

@nyo16 to make everyone on the same page: elixir-nx/nx#1657 (comment)

I think this PR will be straight-forward once we add e4m3fn to Nx, no need for custom functions :)

nyo16 added 3 commits January 25, 2026 14:46
Update fp8 support to match Nx PR #1657 which uses 2-tuple types
({:f8_e4m3fn, 8}) instead of 3-tuple ({:f, 8, :e4m3fn}).

- Remove 3-tuple type mappings and overrides
- Simplify tensor_byte_size, tensor_to_iodata, and build_tensor
- Update tests to use :f8_e4m3fn shorthand and {:f, 8} for E5M2
Temporarily use Nx main branch which has fp8 support until next release.
@nyo16
Copy link
Contributor Author

nyo16 commented Jan 25, 2026

@josevalim I updated this repo to reflects to the changes that we did in Nx.

Copy link
Contributor

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please rewrite the tests to be in the same style of the existing ones? Basically add a new read and a new write test for each type and that’s it. No need for pointless assertions or comments, just copy one of the formats and replace the type (and values). Thank you!

nyo16 added 2 commits January 25, 2026 17:16
Rewrite fp8 tests as simple write/read tests for each type,
matching the style of existing tests.
@josevalim
Copy link
Contributor

@grzuy, any objection?

@grzuy
Copy link
Contributor

grzuy commented Jan 27, 2026

Just "Approved workflows to run"

@grzuy
Copy link
Contributor

grzuy commented Jan 27, 2026

@grzuy, any objection?

@josevalim Nope.
Thank you both 🙂

@grzuy
Copy link
Contributor

grzuy commented Jan 27, 2026

Only pending green checks it seems.

@nyo16
Copy link
Contributor Author

nyo16 commented Jan 27, 2026

@josevalim do you want me to update the ci/cd to has min version 1.16?

@josevalim josevalim merged commit b3f280b into elixir-nx:main Jan 28, 2026
1 of 2 checks passed
@josevalim
Copy link
Contributor

💚 💙 💜 💛 ❤️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments