You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/backends-coreml.md
+6-5Lines changed: 6 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,12 +86,13 @@ To quantize a PyTorch model for the CoreML backend, use the `CoreMLQuantizer`. `
86
86
87
87
### 8-bit Quantization using the PT2E Flow
88
88
89
+
Quantization with the CoreML backend requires exporting the model for iOS17 or later.
89
90
To perform 8-bit quantization with the PT2E flow, perform the following steps:
90
91
91
92
1) Define [coremltools.optimize.torch.quantization.LinearQuantizerConfig](https://apple.github.io/coremltools/source/coremltools.optimize.torch.quantization.html#coremltools.optimize.torch.quantization.LinearQuantizerConfig) and use to to create an instance of a `CoreMLQuantizer`.
92
93
2) Use `torch.export.export_for_training` to export a graph module that will be prepared for quantization.
93
94
3) Call `prepare_pt2e` to prepare the model for quantization.
94
-
4)For static quantization, run the prepared model with representative samples to calibrate the quantizated tensor activation ranges.
95
+
4)Run the prepared model with representative samples to calibrate the quantizated tensor activation ranges.
95
96
5) Call `convert_pt2e` to quantize the model.
96
97
6) Export and lower the model using the standard flow.
The above does static quantization (activations and weights are quantized). Quantizing activations requires calibrating the model on representative data. You can also do weight-only quantization, which does not require calibration data, by specifying the activation_dtype to be torch.float32:
156
+
The above does static quantization (activations and weights are quantized).
157
+
158
+
You can see a full description of available quantization configs in the [coremltools documentation](https://apple.github.io/coremltools/source/coremltools.optimize.torch.quantization.html#coremltools.optimize.torch.quantization.LinearQuantizerConfig). For example, the config below will perform weight-only quantization:
Note that static quantization requires exporting the model for iOS17 or later.
172
+
Quantizing activations requires calibrating the model on representative data. Also note that PT2E currently requires passing at least 1 calibration sample before calling convert_pt2e, even for data-free weight-only quantization.
172
173
173
174
See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information.
0 commit comments