Skip to content

Commit 63a63cc

Browse files
committed
refine
1 parent a8e8d8a commit 63a63cc

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

prototype_source/pt2e_quant_xpu_inductor.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
PyTorch 2 Export Quantization with Intel GPU Backend through Inductor
22
==================================================================
33

4-
**Author**: `Yan Zhiwei <https://github.com/ZhiweiYan-96>`_, `Wang Eikan <https://github.com/EikanWang>`_, `Liu River <https://github.com/riverliuintel>`, `Cui Yifeng <https://github.com/CuiYifeng>`_
4+
**Author**: `Yan Zhiwei <https://github.com/ZhiweiYan-96>`_, `Wang Eikan <https://github.com/EikanWang>`_, `Liu River <https://github.com/riverliuintel>`_, `Cui Yifeng <https://github.com/CuiYifeng>`_
55

66
Prerequisites
77
---------------
@@ -13,7 +13,7 @@ Introduction
1313
--------------
1414

1515
This tutorial introduces XPUInductorQuantizer aiming for serving the quantized model inference on Intel GPUs. The tutorial will cover how it
16-
utilze PyTorch 2 Export Quantization flow and lower the quantized model into the inductor.
16+
utilzes PyTorch 2 Export Quantization flow and lowers the quantized model into the inductor.
1717

1818
The pytorch 2 export quantization flow uses the torch.export to capture the model into a graph and perform quantization transformations on top of the ATen graph.
1919
This approach is expected to have significantly higher model coverage, better programmability, and a simplified UX.
@@ -26,9 +26,9 @@ The quantization flow mainly includes three steps:
2626
performing the prepared model's calibration or quantization-aware training, and converting the prepared model into the quantized model.
2727
- Step 3: Lower the quantized model into inductor with the API ``torch.compile``.
2828

29-
During Step3, the inductor would decide which kernels are dispatched into. There are two kinds of kernels the Intel GPU would obtain benefits, oneDNN kernels and triton kernels. `Intel oneAPI Deep Neural Network Library (oneDNN) <https://github.com/uxlfoundation/oneDNN>` contains
30-
highly-optimized quantized Cong/GEMM kernels for bot CPU and GPU. Furthermore, oneDNN supports extra operator fusion on these operators, like quantized linear with eltwise activation function(ReLU) and binary operation(add, inplace sum).
31-
Besides oneDNN kernels, triton would be responsible to generate kernels on our GPUs, like operators `quantize` and `dequantize`. The triton kernels are optimized by `Intel XPU Backend for Triton <https://github.com/intel/intel-xpu-backend-for-triton>`
29+
During Step 3, the inductor would decide which kernels are dispatched into. There are two kinds of kernels the Intel GPU would obtain benefits, oneDNN kernels and triton kernels. `Intel oneAPI Deep Neural Network Library (oneDNN) <https://github.com/uxlfoundation/oneDNN>` contains
30+
highly-optimized quantized Cong/GEMM kernels for both CPU and GPU. Furthermore, oneDNN supports extra operator fusion on these operators, like quantized linear with eltwise activation function(ReLU) and binary operation(add, inplace sum).
31+
Besides oneDNN kernels, triton would be responsible for generating kernels on our GPUs, like operators `quantize` and `dequantize`. The triton kernels are optimized by `Intel XPU Backend for Triton <https://github.com/intel/intel-xpu-backend-for-triton>`
3232

3333

3434
The high-level architecture of this flow could look like this:
@@ -68,9 +68,9 @@ The high-level architecture of this flow could look like this:
6868
Post Training Quantization
6969
----------------------------
7070

71-
Static quantization is the only method we support currently. QAT and dynami quantization will be avaliable in later versions.
71+
Static quantization is the only method we support currently. QAT and dynamic quantization will be available in later versions.
7272

73-
The dependencies packages are recommend to be installed through Intel GPU channel as follows
73+
The dependencies packages are recommended to be installed through Intel GPU channel as follows
7474

7575
::
7676

@@ -123,7 +123,7 @@ quantize the model.
123123
quantizer = XPUInductorQuantizer()
124124
quantizer.set_global(xpuiq.get_default_xpu_inductor_quantization_config())
125125

126-
The default quantization configuration in ``XPUInductorQuantizer`` uses signed 8-bits for both activations and weights. The tensor is per-tensor quantized, while weight is signed 8-bit per-channel quantized.
126+
The default quantization configuration in ``XPUInductorQuantizer`` uses signed 8-bits for both activations and weights. The tensor is per-tensor quantized, while the weight is signed 8-bit per-channel quantized.
127127

128128
Besides the default quant configuration (asymmetric quantized activation), we also support signed 8-bits symmetric quantized activation, which has the potential to provide better performance.
129129

0 commit comments

Comments
 (0)