feat: Add FP8 dtype support (E4M3FN and E5M2) by nyo16 · Pull Request #15 · elixir-nx/safetensors

nyo16 · 2026-01-08T17:09:57Z

Summary

Add FP8 (8-bit floating point) dtype support for E4M3FN and E5M2 formats. This enables reading and writing FP8 quantized model
weights from HuggingFace models like Qwen3-FP8.

Changes

Add FP8 dtype constants (F8_E4M3FN, F8_E5M2) and type mapping
Handle 3-tuple FP8 types {:f, 8, :e4m3fn} and {:f, 8, :e5m2} in:
- dtype_from_string/1 - Parse "F8_E4M3" from safetensor headers
- tensor_byte_size/1 - Calculate byte size for FP8 tensors
- tensor_to_iodata/1 - Serialize FP8 tensors
- build_tensor/2 - Deserialize FP8 tensors
Support reading FP8 model files (e.g., Qwen/Qwen3-0.6B-FP8)

Test plan

Unit tests for FP8 type encoding/decoding
Integration test reading real FP8 model files
Verified with Qwen3-0.6B-FP8 model inference

Notes

This is the first PR in a series to enable native FP8 model inference:

safetensors (this PR) - FP8 file I/O
nx/exla - FP8 type system support
bumblebee - FP8 model loading and inference

Adds support for F8_E4M3 and F8_E5M2 dtypes in SafeTensors format, enabling loading of fp8-quantized models from HuggingFace. Changes: - Add {:f, 8, :e4m3fn} → "F8_E4M3" mapping - Add {:f, 8, :e5m2} → "F8_E5M2" mapping - Add {:f, 8} → "F8_E5M2" for backward compatibility - Update dtype_to_type reverse mappings for fp8 formats Enables loading models like Qwen3-4B-Instruct-2507-FP8 which uses F8_E4M3 format for weights with fine-grained quantization.

- Test write/read for E4M3FN and E5M2 tensors - Test type preservation in round-trip - Test lazy loading with fp8 types - Test byte size calculation - Test dtype strings in SafeTensors header - Add NX_PATH environment variable support for local development

josevalim · 2026-01-08T17:27:45Z

Please remove the convo.txt :)

My suggestion is to break this in two. The first one is to add FP8 support, which means E5M2. No need for additional tuples and steps.

then a separate PR adds handling for unknown types. For now, the user should pass a separate function that receives the type and the value and builds the tensors

which types QWEN uses?

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

nyo16 · 2026-01-08T17:43:00Z

Qwen3 is using: F8_E4M3

nyo16 · 2026-01-08T17:44:54Z

ok I will work to break this down, for bumblebee and Nx i will open the PRs as draft for open more the discussion.

josevalim · 2026-01-08T19:37:49Z

@nyo16 to make everyone on the same page: elixir-nx/nx#1657 (comment)

I think this PR will be straight-forward once we add e4m3fn to Nx, no need for custom functions :)

Update fp8 support to match Nx PR #1657 which uses 2-tuple types ({:f8_e4m3fn, 8}) instead of 3-tuple ({:f, 8, :e4m3fn}). - Remove 3-tuple type mappings and overrides - Simplify tensor_byte_size, tensor_to_iodata, and build_tensor - Update tests to use :f8_e4m3fn shorthand and {:f, 8} for E5M2

Temporarily use Nx main branch which has fp8 support until next release.

nyo16 · 2026-01-25T19:51:15Z

@josevalim I updated this repo to reflects to the changes that we did in Nx.

josevalim

Can you please rewrite the tests to be in the same style of the existing ones? Basically add a new read and a new write test for each type and that’s it. No need for pointless assertions or comments, just copy one of the formats and replace the type (and values). Thank you!

Rewrite fp8 tests as simple write/read tests for each type, matching the style of existing tests.

josevalim · 2026-01-26T18:28:59Z

@grzuy, any objection?

grzuy · 2026-01-27T22:24:09Z

Just "Approved workflows to run"

grzuy · 2026-01-27T22:27:02Z

@grzuy, any objection?

@josevalim Nope.
Thank you both 🙂

grzuy · 2026-01-27T22:27:52Z

Only pending green checks it seems.

nyo16 · 2026-01-27T22:32:16Z

@josevalim do you want me to update the ci/cd to has min version 1.16?

josevalim · 2026-01-28T07:14:44Z

💚 💙 💜 💛 ❤️

nyo16 added 5 commits January 6, 2026 00:13

fix: Handle 3-tuple fp8 types in tensor_byte_size and tensor_to_iodata

72656d6

fix: Handle 3-tuple fp8 types in build_tensor

fbcebbd

docs: Add conversation export for GPU testing session

6af9cfc

chore: Remove temporary conversation file

61d5cac

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

nyo16 added 3 commits January 25, 2026 14:46

chore: Use Nx from GitHub main branch

450b8f5

Temporarily use Nx main branch which has fp8 support until next release.

chore: Add TODO comment for Nx dependency

1c0bae1

josevalim requested changes Jan 25, 2026

View reviewed changes

nyo16 added 2 commits January 25, 2026 17:16

test: Simplify fp8 tests to match existing style

992bd5e

Rewrite fp8 tests as simple write/read tests for each type, matching the style of existing tests.

test: Fix fp8 write tests to assert exact binary output

b5014ce

josevalim approved these changes Jan 26, 2026

View reviewed changes

josevalim merged commit b3f280b into elixir-nx:main Jan 28, 2026
1 of 2 checks passed

Conversation

nyo16 commented Jan 8, 2026

Summary

Changes

Test plan

Notes

Uh oh!

josevalim commented Jan 8, 2026

Uh oh!

nyo16 commented Jan 8, 2026

Uh oh!

nyo16 commented Jan 8, 2026

Uh oh!

josevalim commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nyo16 commented Jan 25, 2026

Uh oh!

josevalim left a comment

Choose a reason for hiding this comment

Uh oh!

josevalim commented Jan 26, 2026

Uh oh!

grzuy commented Jan 27, 2026

Uh oh!

grzuy commented Jan 27, 2026

Uh oh!

grzuy commented Jan 27, 2026

Uh oh!

nyo16 commented Jan 27, 2026

Uh oh!

Uh oh!

josevalim commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

josevalim commented Jan 8, 2026 •

edited

Loading