|
1 | 1 | PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
|
2 | 2 | ==================================================================
|
3 | 3 |
|
4 |
| -**Author**: `Yan Zhiwei <https://github.com/ZhiweiYan-96>`, `Wang Eikan <https://github.com/EikanWang>`, `Liu River <https://github.com/riverliuintel>`, `Cui Yifeng <https://github.com/CuiYifeng>` |
5 |
| - |
| 4 | +**Author**: `Yan Zhiwei <https://github.com/ZhiweiYan-96>`_, `Wang Eikan <https://github.com/EikanWang>`_, `Liu River <https://github.com/riverliuintel>`, `Cui Yifeng <https://github.com/CuiYifeng>`_ |
6 | 5 |
|
7 | 6 | Prerequisites
|
8 | 7 | ---------------
|
9 | 8 |
|
10 | 9 | - `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
|
11 | 10 | - `TorchInductor and torch.compile concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_
|
12 | 11 |
|
13 |
| - |
14 | 12 | Introduction
|
15 | 13 | --------------
|
16 | 14 |
|
@@ -64,23 +62,20 @@ The high-level architecture of this flow could look like this:
|
64 | 62 | Inductor
|
65 | 63 | |
|
66 | 64 | —--------------------------------------------------------
|
67 |
| - | oneDNN Kernels ATen Ops Triton Kernels | |
| 65 | + | oneDNN Kernels ATen Ops Triton Kernels | |
68 | 66 | —--------------------------------------------------------
|
69 | 67 |
|
70 |
| - |
71 |
| - |
72 | 68 | Post Training Quantization
|
73 | 69 | ----------------------------
|
74 | 70 |
|
75 | 71 | Static quantization is the only method we support currently. QAT and dynami quantization will be avaliable in later versions.
|
76 | 72 |
|
77 |
| -Please install dependencies package through Intel GPU channels as follows |
| 73 | +The dependencies packages are recommend to be installed through Intel GPU channel as follows |
78 | 74 |
|
79 | 75 | ::
|
80 | 76 |
|
81 | 77 | pip install torchvision pytorch-triton-xpu --index-url https://download.pytorch.org/whl/nightly/xpu
|
82 | 78 |
|
83 |
| - |
84 | 79 | 1. Capture FX Graph
|
85 | 80 | ^^^^^^^^^^^^^^^^^^^^^
|
86 | 81 |
|
@@ -128,14 +123,12 @@ quantize the model.
|
128 | 123 | quantizer = XPUInductorQuantizer()
|
129 | 124 | quantizer.set_global(xpuiq.get_default_xpu_inductor_quantization_config())
|
130 | 125 |
|
131 |
| -.. note:: |
132 |
| - |
133 |
| - The default quantization configuration in ``XPUInductorQuantizer`` uses signed 8-bits for both activations and weights. The tensor is per-tensor quantized, while weight is signed 8-bit per-channel quantized. |
| 126 | +The default quantization configuration in ``XPUInductorQuantizer`` uses signed 8-bits for both activations and weights. The tensor is per-tensor quantized, while weight is signed 8-bit per-channel quantized. |
134 | 127 |
|
135 |
| - Besides the default quant configuration, we also support signed 8-bits symmetric quantized activation, which has the potential |
136 |
| - to provide better performance. |
| 128 | +Besides the default quant configuration (asymmetric quantized activation), we also support signed 8-bits symmetric quantized activation, which has the potential to provide better performance. |
137 | 129 |
|
138 | 130 | ::
|
| 131 | + |
139 | 132 | from torch.ao.quantization.observer import HistogramObserver, PerChannelMinMaxObserver
|
140 | 133 | from torch.ao.quantization.quantizer.quantizer import QuantizationSpec
|
141 | 134 | from torch.ao.quantization.quantizer.xnnpack_quantizer_utils import QuantizationConfig
|
@@ -182,14 +175,14 @@ quantize the model.
|
182 | 175 | )
|
183 | 176 | return quantization_config
|
184 | 177 |
|
185 |
| - Then, the user can set the quantization configuration to the quantizer. |
| 178 | +Then, we can set the quantization configuration to the quantizer. |
186 | 179 |
|
187 | 180 | ::
|
188 | 181 | quantizer = XPUInductorQuantizer()
|
189 | 182 | quantizer.set_global(get_xpu_inductor_symm_quantization_config())
|
190 | 183 |
|
191 |
| - After we import the backend-specific Quantizer, we will prepare the model for post-training quantization. |
192 |
| - ``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model. |
| 184 | +After we import the backend-specific Quantizer, we will prepare the model for post-training quantization. |
| 185 | +``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model. |
193 | 186 |
|
194 | 187 | ::
|
195 | 188 |
|
|
0 commit comments