You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/website/docs/tutorials/quantization_flow.md
+28-3Lines changed: 28 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,19 +20,44 @@ Note: Before quantizing models, each backend need to implement their own `Quanti
20
20
Please take a look at the [pytorch 2.0 export post training static quantization tutorial](https://pytorch.org/tutorials/prototype/pt2e_quant_ptq_static.html) to learn about all the steps of quantization. Main APIs that's used to quantize the model would be:
21
21
*`prepare_pt2e`: used to insert observers to the model, it takes a backend specific `Quantizer` as argument, which will annotate the nodes with informations needed to quantize the model properly for the backend
22
22
* (not an api) calibration: run the model through some sample data
23
-
*`convert_pt2e`: convert a observed model to a quantized model, we have special representation for selected ops (e.g. quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
23
+
*`convert_pt2e`: convert a observed model to a quantized model.
24
+
24
25
25
26
### Result
26
27
The result after these steps will be a reference quantized model, with quantize/dequantize operators being further decomposed. Example:
27
28
29
+
#### Q/DQ Representation (default)
30
+
We'll have (dq -> float32_op -> q) representation for all quantized operators
(WIP, expected to be ready at end of August): we have special representation for selected ops (e.g. quantized linear), other ops are represented as (dq -> float32_op -> q), and q/dq are decomposed into more primitive operators.
0 commit comments