Skip to content

Commit 78a8a8d

Browse files
committed
up
1 parent df0e7a0 commit 78a8a8d

File tree

1 file changed

+6
-5
lines changed

1 file changed

+6
-5
lines changed

docs/source/backends-coreml.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -86,12 +86,13 @@ To quantize a PyTorch model for the CoreML backend, use the `CoreMLQuantizer`. `
8686

8787
### 8-bit Quantization using the PT2E Flow
8888

89+
Quantization with the CoreML backend requires exporting the model for iOS17 or later.
8990
To perform 8-bit quantization with the PT2E flow, perform the following steps:
9091

9192
1) Define [coremltools.optimize.torch.quantization.LinearQuantizerConfig](https://apple.github.io/coremltools/source/coremltools.optimize.torch.quantization.html#coremltools.optimize.torch.quantization.LinearQuantizerConfig) and use to to create an instance of a `CoreMLQuantizer`.
9293
2) Use `torch.export.export_for_training` to export a graph module that will be prepared for quantization.
9394
3) Call `prepare_pt2e` to prepare the model for quantization.
94-
4) For static quantization, run the prepared model with representative samples to calibrate the quantizated tensor activation ranges.
95+
4) Run the prepared model with representative samples to calibrate the quantizated tensor activation ranges.
9596
5) Call `convert_pt2e` to quantize the model.
9697
6) Export and lower the model using the standard flow.
9798

@@ -152,7 +153,9 @@ et_program = to_edge_transform_and_lower(
152153
).to_executorch()
153154
```
154155

155-
The above does static quantization (activations and weights are quantized). Quantizing activations requires calibrating the model on representative data. You can also do weight-only quantization, which does not require calibration data, by specifying the activation_dtype to be torch.float32:
156+
The above does static quantization (activations and weights are quantized).
157+
158+
You can see a full description of available quantization configs in the [coremltools documentation](https://apple.github.io/coremltools/source/coremltools.optimize.torch.quantization.html#coremltools.optimize.torch.quantization.LinearQuantizerConfig). For example, the config below will perform weight-only quantization:
156159

157160
```
158161
weight_only_8bit_config = ct.optimize.torch.quantization.LinearQuantizerConfig(
@@ -164,11 +167,9 @@ weight_only_8bit_config = ct.optimize.torch.quantization.LinearQuantizerConfig(
164167
)
165168
)
166169
quantizer = CoreMLQuantizer(weight_only_8bit_config)
167-
prepared_model = prepare_pt2e(training_gm, quantizer)
168-
quantized_model = convert_pt2e(prepared_model)
169170
```
170171

171-
Note that static quantization requires exporting the model for iOS17 or later.
172+
Quantizing activations requires calibrating the model on representative data. Also note that PT2E currently requires passing at least 1 calibration sample before calling convert_pt2e, even for data-free weight-only quantization.
172173

173174
See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information.
174175

0 commit comments

Comments
 (0)