Skip to content

Update quantization overview and contributor guide doc #2723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jerryzh168
Copy link
Contributor

Summary:
We have recently updated our design for structuring tensor subclasses in torchao to remove unnecessary abstractions and reduce indirections and having a structuring that aligns better with people's intuitive understanding of different quantization use cases, examples using the new design are: #2463, #2687

Test Plan:
check generated doc
Reviewers:

Subscribers:

Tasks:

Tags:

@jerryzh168 jerryzh168 requested a review from andrewor14 August 8, 2025 21:27
Copy link

pytorch-bot bot commented Aug 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2723

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 538e24f with merge base 6cfa477 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 8, 2025
@jerryzh168 jerryzh168 added the topic: documentation Use this tag if this PR adds or improves documentation label Aug 8, 2025
@jerryzh168 jerryzh168 force-pushed the update-quant-overview branch 2 times, most recently from 8d12652 to d007a58 Compare August 8, 2025 22:11
Copy link
Contributor

@andrewor14 andrewor14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Content looks great! I think we just need to make sure all the tables and code blocks render properly, e.g.

Also one thing I think is missing is a section or paragraph on the status of AffineQuantizedTensor. This still powers most of our existing quantization configs but I think we want to move away from using this for new configs, is that right? Maybe we should clarify this distinction otherwise users may be confused which tensor subclass to use.

:toctree: generated/
:nosignatures:

TorchAOBaseTensor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to this PR but should this be in torchao.core instead? That's where we have AOBaseConfig today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah probably, we can move this I think


Layout/TensorImpl
~~~~~~~~~~~~~~~~~
KernelPreference
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to document this in one of the api_refs as well

Int4Tensor scaled int4 plain (pack 2 adjacent int4 to a single int8 value) int4 weight only quantization
Int4PreshuffledTensor scaled int4 preshuffled (special format to optimize for loading) float8 act + int4 weight dynamic quantization
int4 weight only quantization
====================== ============== ====================================================== ===============================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this table isn't rendering

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK metamate lied to me


.. note::
Note that we also don't use dynamic activation in the name, since we are talking about the weight tensor object, including information about activation in the tensor subclass name will be confusing, but
we do implement both weight only and dynamic activation quantization in the same linear function implementation, without relying on additional abstractions, this keeps relevant quantization operations close
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean for to_linear_activation_quantized? New features should not use that anymore right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we don't use this anymore, to reduce the number of abstractions, it only requires a few lines of additional code in each tensor subclass

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To connect everything together, here is a more detailed walk through for float8 dynamic activation and float8 weight quantization in torchao (DEFAULT kernel preference, in H100, when fbgemm_gpu_genai library is installed):

Quantization Flow: quantize_(model, Float8DynamicActivationFloat8WeightConfig())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not super related but I notice we have these two configs:

Float8DynamicActivationFloat8WeightConfig
Float8ActivationInt4WeightConfig

Should we drop "Dynamic" from the first one to be more consistent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, we should add Dynamic to the second one I feel

Copy link
Contributor Author

@jerryzh168 jerryzh168 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is static activation right now actually:

class Float8StaticActivationFloat8WeightConfig(AOBaseConfig):
although this may not be used much right now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah just saw this, either way looks good to me as long as we are consistent


* ``torch.uint1`` to ``torch.uint7`` available in pytorch 2.3 and later
* ``torch.int1`` to ``torch.int7`` available in pytorch 2.6 and later
* ``torch.float4_e2m1fn_x2``, ``torch.float8_e4m3fn``, ``torch.float8_e4m3fnuz``, ``torch.float8_e5m2``, ``torch.float8_e5m2fnuz``, ``torch.float8_e8m0fnu``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also mention the MX dtypes here and mark them as prototype? (and in the ascii art)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MX is using torch.float8_e8m0fnu for scale and then torch.float4_e2m1fn_x2 and torch.float8_e4m3fn (or some other fp8 dtypes) for data I think. cc @drisspg to confirm

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah I see, maybe just mention the high-level dtypes in the ascii art then (e.g. mxfp4, mxfp6, mxfp8, nvfp4)?

Copy link
Contributor Author

@jerryzh168 jerryzh168 Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will be defined in tensor subclasses directly, we are listing the pytorch dtypes here, I can add a note about this though

@jerryzh168 jerryzh168 force-pushed the update-quant-overview branch 3 times, most recently from 2291c28 to d1d4af8 Compare August 11, 2025 23:48
Summary:
We have recently updated our design for structuring tensor subclasses in torchao
to remove unnecessary abstractions and reduce indirections and having a structuring that
aligns better with people's intuitive understanding of different quantization use cases,
examples using the new design are: pytorch#2463, pytorch#2687

Test Plan:
check generated doc
Reviewers:

Subscribers:

Tasks:

Tags:
@jerryzh168 jerryzh168 force-pushed the update-quant-overview branch from d1d4af8 to 538e24f Compare August 12, 2025 05:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: documentation Use this tag if this PR adds or improves documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants