Skip to content

Commit 3eb1458

Browse files
committed
Update on "Documentation Updates"
Summary: Updating README with better examples, updating class and api documentation and removing the unnecessary int_mm_fused_mul option from dynamic quant Test Plan: python test/test.py Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
1 parent 1bd3613 commit 3eb1458

File tree

1 file changed

+4
-6
lines changed

1 file changed

+4
-6
lines changed

README.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,8 @@ torchao 0.0.1 <install dir>
2727

2828
Relevant APIs can be found in torchao.quantization.quant_api
2929

30-
Note: Depending on the technique being applied to the model, you may see a perf degredation.
31-
This is because quantization adds additional overhead to the model that is hopefully made up for
32-
with faster matmuls. If your matmuls are small enough (or have odd shapes), the overhead can be larger than the gain
33-
from the quantized matmul.
30+
Note: While these techniques are designed to improve model performance, in some cases the opposite can occur.
31+
This is because quantization adds additional overhead to the model that is hopefully made up for by faster matmuls (for dynamic quantization) or loading weights faster (for weight-only quantization). If your matmuls are small enough or your non-quantized perf isn't bottlenecked by weight load time, these techniques may reduce performance.
3432

3533
### A8W8 Dynamic Quantization
3634

@@ -88,7 +86,7 @@ other techniques.
8886
### A8W8 Dynamic Quantization with Smoothquant
8987

9088
We've also implemented a version of [smoothquant](https://arxiv.org/abs/2211.10438) with the same GEMM format as above.
91-
Due to requiring calibraiton the API is slightly more complicated
89+
Due to requiring calibration, the API is slightly more complicated
9290

9391
Example
9492

@@ -97,7 +95,7 @@ import torch
9795
from torchao.smoothquant import swap_linear_with_smooth_fq_linear, smooth_fq_linear_to_inference
9896
9997
# some user model
100-
model = torch.nn.Sequential(torch.nn.Linear(32, 64)).cuda().to(torch.bfloat16)
98+
model = get_model()
10199
102100
# convert linear modules to smoothquant
103101
# linear module in calibration mode

0 commit comments

Comments
 (0)