You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary: Updating README with better examples, updating class and api
documentation and removing the unnecessary int_mm_fused_mul option from
dynamic quant
Test Plan: python test/test.py
Reviewers:
Subscribers:
Tasks:
Tags:
[ghstack-poisoned]
Copy file name to clipboardExpand all lines: README.md
+4-6Lines changed: 4 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,10 +27,8 @@ torchao 0.0.1 <install dir>
27
27
28
28
Relevant APIs can be found in torchao.quantization.quant_api
29
29
30
-
Note: Depending on the technique being applied to the model, you may see a perf degredation.
31
-
This is because quantization adds additional overhead to the model that is hopefully made up for
32
-
with faster matmuls. If your matmuls are small enough (or have odd shapes), the overhead can be larger than the gain
33
-
from the quantized matmul.
30
+
Note: While these techniques are designed to improve model performance, in some cases the opposite can occur.
31
+
This is because quantization adds additional overhead to the model that is hopefully made up for by faster matmuls (for dynamic quantization) or loading weights faster (for weight-only quantization). If your matmuls are small enough or your non-quantized perf isn't bottlenecked by weight load time, these techniques may reduce performance.
34
32
35
33
### A8W8 Dynamic Quantization
36
34
@@ -88,7 +86,7 @@ other techniques.
88
86
### A8W8 Dynamic Quantization with Smoothquant
89
87
90
88
We've also implemented a version of [smoothquant](https://arxiv.org/abs/2211.10438) with the same GEMM format as above.
91
-
Due to requiring calibraiton the API is slightly more complicated
89
+
Due to requiring calibration, the API is slightly more complicated
92
90
93
91
Example
94
92
@@ -97,7 +95,7 @@ import torch
97
95
from torchao.smoothquant import swap_linear_with_smooth_fq_linear, smooth_fq_linear_to_inference
98
96
99
97
# some user model
100
-
model = torch.nn.Sequential(torch.nn.Linear(32, 64)).cuda().to(torch.bfloat16)
0 commit comments