-
Notifications
You must be signed in to change notification settings - Fork 349
Update TorchAO README inference section before PTC #3206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3206
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New Failures, 1 Cancelled Job, 2 Unrelated FailuresAs of commit f7762ad with merge base ced6231 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
b0fc829
to
081118f
Compare
from torchao.quantization import Int4WeightOnlyConfig, quantize_ | ||
quantize_(model, Int4WeightOnlyConfig(group_size=32, version=1)) | ||
``` | ||
Compared to a `torch.compiled` bf16 baseline, your quantized model should be significantly smaller and faster on a single A100 GPU: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removing these since toy model memory/latency is not meaningful, to make our README shorter
|
||
TorchAO is integrated into some of the leading open-source libraries including: | ||
|
||
* HuggingFace transformers with a [builtin inference backend](https://huggingface.co/docs/transformers/main/quantization/torchao) and [low bit optimizers](https://github.com/huggingface/transformers/pull/31865) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reordered a bit to put more commonly used ones earlier
081118f
to
d45a249
Compare
Summary: att Test Plan: visual inspection Reviewers: Subscribers: Tasks: Tags:
d45a249
to
f7762ad
Compare
Summary:
att
Test Plan:
visual inspection
Reviewers:
Subscribers:
Tasks:
Tags: