Skip to content

Commit a8e8d8a

Browse files
committed
syntax
1 parent 6bfa4d7 commit a8e8d8a

File tree

1 file changed

+9
-16
lines changed

1 file changed

+9
-16
lines changed

prototype_source/pt2e_quant_xpu_inductor.rst

Lines changed: 9 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,14 @@
11
PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
22
==================================================================
33

4-
**Author**: `Yan Zhiwei <https://github.com/ZhiweiYan-96>`, `Wang Eikan <https://github.com/EikanWang>`, `Liu River <https://github.com/riverliuintel>`, `Cui Yifeng <https://github.com/CuiYifeng>`
5-
4+
**Author**: `Yan Zhiwei <https://github.com/ZhiweiYan-96>`_, `Wang Eikan <https://github.com/EikanWang>`_, `Liu River <https://github.com/riverliuintel>`, `Cui Yifeng <https://github.com/CuiYifeng>`_
65

76
Prerequisites
87
---------------
98

109
- `PyTorch 2 Export Post Training Quantization <https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html>`_
1110
- `TorchInductor and torch.compile concepts in PyTorch <https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.html>`_
1211

13-
1412
Introduction
1513
--------------
1614

@@ -64,23 +62,20 @@ The high-level architecture of this flow could look like this:
6462
Inductor
6563
|
6664
—--------------------------------------------------------
67-
| oneDNN Kernels ATen Ops Triton Kernels |
65+
| oneDNN Kernels ATen Ops Triton Kernels |
6866
—--------------------------------------------------------
6967

70-
71-
7268
Post Training Quantization
7369
----------------------------
7470

7571
Static quantization is the only method we support currently. QAT and dynami quantization will be avaliable in later versions.
7672

77-
Please install dependencies package through Intel GPU channels as follows
73+
The dependencies packages are recommend to be installed through Intel GPU channel as follows
7874

7975
::
8076

8177
pip install torchvision pytorch-triton-xpu --index-url https://download.pytorch.org/whl/nightly/xpu
8278

83-
8479
1. Capture FX Graph
8580
^^^^^^^^^^^^^^^^^^^^^
8681

@@ -128,14 +123,12 @@ quantize the model.
128123
quantizer = XPUInductorQuantizer()
129124
quantizer.set_global(xpuiq.get_default_xpu_inductor_quantization_config())
130125

131-
.. note::
132-
133-
The default quantization configuration in ``XPUInductorQuantizer`` uses signed 8-bits for both activations and weights. The tensor is per-tensor quantized, while weight is signed 8-bit per-channel quantized.
126+
The default quantization configuration in ``XPUInductorQuantizer`` uses signed 8-bits for both activations and weights. The tensor is per-tensor quantized, while weight is signed 8-bit per-channel quantized.
134127

135-
Besides the default quant configuration, we also support signed 8-bits symmetric quantized activation, which has the potential
136-
to provide better performance.
128+
Besides the default quant configuration (asymmetric quantized activation), we also support signed 8-bits symmetric quantized activation, which has the potential to provide better performance.
137129

138130
::
131+
139132
from torch.ao.quantization.observer import HistogramObserver, PerChannelMinMaxObserver
140133
from torch.ao.quantization.quantizer.quantizer import QuantizationSpec
141134
from torch.ao.quantization.quantizer.xnnpack_quantizer_utils import QuantizationConfig
@@ -182,14 +175,14 @@ quantize the model.
182175
)
183176
return quantization_config
184177

185-
Then, the user can set the quantization configuration to the quantizer.
178+
Then, we can set the quantization configuration to the quantizer.
186179

187180
::
188181
quantizer = XPUInductorQuantizer()
189182
quantizer.set_global(get_xpu_inductor_symm_quantization_config())
190183

191-
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
192-
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
184+
After we import the backend-specific Quantizer, we will prepare the model for post-training quantization.
185+
``prepare_pt2e`` folds BatchNorm operators into preceding Conv2d operators, and inserts observers in appropriate places in the model.
193186

194187
::
195188

0 commit comments

Comments
 (0)