Update quantization overview and contributor guide doc #2723

jerryzh168 · 2025-08-08T21:27:00Z

Summary:
We have recently updated our design for structuring tensor subclasses in torchao to remove unnecessary abstractions and reduce indirections and having a structuring that aligns better with people's intuitive understanding of different quantization use cases, examples using the new design are: #2463, #2687

Test Plan:
check generated doc
Reviewers:

Subscribers:

Tasks:

Tags:

pytorch-bot · 2025-08-08T21:27:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2723

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 538e24f with merge base 6cfa477 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

andrewor14

Content looks great! I think we just need to make sure all the tables and code blocks render properly, e.g.

Also one thing I think is missing is a section or paragraph on the status of AffineQuantizedTensor. This still powers most of our existing quantization configs but I think we want to move away from using this for new configs, is that right? Maybe we should clarify this distinction otherwise users may be confused which tensor subclass to use.

andrewor14 · 2025-08-11T15:38:12Z

docs/source/api_ref_utils.rst

+    :toctree: generated/
+    :nosignatures:
+
+    TorchAOBaseTensor


not related to this PR but should this be in torchao.core instead? That's where we have AOBaseConfig today

yeah probably, we can move this I think

docs/source/contributor_guide.rst

andrewor14 · 2025-08-11T15:47:04Z

docs/source/contributor_guide.rst


-Layout/TensorImpl
-~~~~~~~~~~~~~~~~~
+KernelPreference


Need to document this in one of the api_refs as well

docs/source/contributor_guide.rst

docs/source/quantization_overview.rst

andrewor14 · 2025-08-11T15:56:14Z

docs/source/quantization_overview.rst

+Int4Tensor              scaled int4     plain (pack 2 adjacent int4 to a single int8 value)     int4 weight only quantization
+Int4PreshuffledTensor   scaled int4     preshuffled (special format to optimize for loading)    float8 act + int4 weight dynamic quantization
+                                                                                                int4 weight only quantization
+====================== ==============  ======================================================   ===============================================


this table isn't rendering

OK metamate lied to me

andrewor14 · 2025-08-11T15:58:42Z

docs/source/quantization_overview.rst

+
+.. note::
+   Note that we also don't use dynamic activation in the name, since we are talking about the weight tensor object, including information about activation in the tensor subclass name will be confusing, but
+   we do implement both weight only and dynamic activation quantization in the same linear function implementation, without relying on additional abstractions, this keeps relevant quantization operations close


What does this mean for to_linear_activation_quantized? New features should not use that anymore right?

yeah, we don't use this anymore, to reduce the number of abstractions, it only requires a few lines of additional code in each tensor subclass

andrewor14 · 2025-08-11T16:00:54Z

docs/source/quantization_overview.rst

+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+To connect everything together, here is a more detailed walk through for float8 dynamic activation and float8 weight quantization in torchao (DEFAULT kernel preference, in H100, when fbgemm_gpu_genai library is installed):
+
+Quantization Flow: quantize_(model, Float8DynamicActivationFloat8WeightConfig())


not super related but I notice we have these two configs:

Float8DynamicActivationFloat8WeightConfig Float8ActivationInt4WeightConfig

Should we drop "Dynamic" from the first one to be more consistent?

oh, we should add Dynamic to the second one I feel

there is static activation right now actually:

ao/torchao/quantization/quant_api.py

Line 1865 in 510e1b4

class Float8StaticActivationFloat8WeightConfig(AOBaseConfig):

although this may not be used much right now

yeah just saw this, either way looks good to me as long as we are consistent

andrewor14 · 2025-08-11T16:10:55Z

docs/source/quantization_overview.rst

+
+* ``torch.uint1`` to ``torch.uint7`` available in pytorch 2.3 and later
+* ``torch.int1`` to ``torch.int7`` available in pytorch 2.6 and later
+* ``torch.float4_e2m1fn_x2``, ``torch.float8_e4m3fn``, ``torch.float8_e4m3fnuz``, ``torch.float8_e5m2``, ``torch.float8_e5m2fnuz``, ``torch.float8_e8m0fnu``


should we also mention the MX dtypes here and mark them as prototype? (and in the ascii art)

MX is using torch.float8_e8m0fnu for scale and then torch.float4_e2m1fn_x2 and torch.float8_e4m3fn (or some other fp8 dtypes) for data I think. cc @drisspg to confirm

ah I see, maybe just mention the high-level dtypes in the ascii art then (e.g. mxfp4, mxfp6, mxfp8, nvfp4)?

these will be defined in tensor subclasses directly, we are listing the pytorch dtypes here, I can add a note about this though

Summary: We have recently updated our design for structuring tensor subclasses in torchao to remove unnecessary abstractions and reduce indirections and having a structuring that aligns better with people's intuitive understanding of different quantization use cases, examples using the new design are: pytorch#2463, pytorch#2687 Test Plan: check generated doc Reviewers: Subscribers: Tasks: Tags:

jerryzh168 requested a review from andrewor14 August 8, 2025 21:27

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 8, 2025

jerryzh168 requested review from danielvegamyhre, vkuzo, drisspg and Xia-Weiwen August 8, 2025 21:27

jerryzh168 added the topic: documentation Use this tag if this PR adds or improves documentation label Aug 8, 2025

jerryzh168 mentioned this pull request Aug 8, 2025

Refresh "Quantization Overview" docs page #2643

Open

jerryzh168 force-pushed the update-quant-overview branch 2 times, most recently from 8d12652 to d007a58 Compare August 8, 2025 22:11

andrewor14 approved these changes Aug 11, 2025

View reviewed changes

andrewor14 reviewed Aug 11, 2025

View reviewed changes

jerryzh168 force-pushed the update-quant-overview branch 3 times, most recently from 2291c28 to d1d4af8 Compare August 11, 2025 23:48

jerryzh168 force-pushed the update-quant-overview branch from d1d4af8 to 538e24f Compare August 12, 2025 05:16

Update quantization overview and contributor guide doc #2723

Are you sure you want to change the base?

Update quantization overview and contributor guide doc #2723

Uh oh!

Conversation

jerryzh168 commented Aug 8, 2025

Uh oh!

pytorch-bot bot commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2723

✅ No Failures

Uh oh!

andrewor14 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jerryzh168 Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 8, 2025 •

edited

Loading

jerryzh168 Aug 11, 2025 •

edited

Loading

jerryzh168 Aug 11, 2025 •

edited

Loading