Skip to content

Commit e717234

Browse files
committed
Update XNNPACK and backend template docs
1 parent d2f9ff6 commit e717234

File tree

2 files changed

+82
-45
lines changed

2 files changed

+82
-45
lines changed

docs/source/backend-template.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,42 @@
11
# Backend Template
22

3+
Provide a brief overview/description of the backend. At a high-level, what does it do? Consider linking to top-level vendor documentation for the target hardware family and/or framework (Core ML, XNNPACK, etc.).
4+
35
## Features
46

7+
List high-level features of backend, such as general operator and hardware support.
8+
59
## Target Requirements
610

11+
What hardware and software is required to run the backend on a specific device? For example, does it require specific iOS or Android OS versions? If it's an NPU, what hardware models are supported?
12+
713
## Development Requirements
814

9-
## Lowering a Model to *Backend Name*
15+
What software and hardware is needed to create a .PTE file targeting this backend? Are there any additional dependencies that need to be installed that are not included with the ExecuTorch pip package? How does the user install them?
16+
17+
## Using *Backend Name*
18+
19+
This section describes the steps users need to take in order to generate a .PTE targeting this backend. Include a full code sample for exporting and lowering a model to this backend. Make sure relevant imports for the backend partitioner are included.
1020

1121
### Partitioner API
1222

23+
What options, if any, does the partitioner take? Are there any other export-time configurations that can be applied? Document each option.
24+
1325
### Quantization
1426

27+
What quantization schemes does this backend support? Consider including the following, as appropriate.
28+
- What operators are supported?
29+
- Number of bits?
30+
- Static vs dynamic activations?
31+
- Weight only vs activations + weights?
32+
- Symmetric vs asymmetric weights?
33+
- Per-tensor, per-chanel, group/blockwise?
34+
35+
Include a code snippet demonstrating how to perform quantization for this backend. Document, or link to, a description of the parameters that the user can specify.
36+
1537
## Runtime Integration
38+
39+
This section is intended to tell the user all of the steps they'll need to take to be able to run a .PTE file on-device that is targeting the given backend.
40+
- What CMake targets should they link to?
41+
- How is this backend compiled from source?
42+
- Is the backend bundled by default in iOS and/or Android pre-built libraries?

docs/source/backends-xnnpack.md

Lines changed: 54 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
11
# XNNPACK Backend
22

3-
The XNNPACK delegate is the ExecuTorch solution for CPU execution on mobile CPUs. XNNPACK is a library that provides optimized kernels for machine learning operators on Arm and x86 CPUs.
3+
The XNNPACK delegate is the ExecuTorch solution for CPU execution on mobile CPUs. [XNNPACK](https://github.com/google/XNNPACK/tree/master) is a library that provides optimized kernels for machine learning operators on Arm and x86 CPUs.
44

55
## Features
66

77
- Wide operator support on Arm and x86 CPUs, available on any modern mobile phone.
88
- Support for a wide variety of quantization schemes and quantized operators.
9+
- Supports fp32 and fp16 activations.
10+
- Supports 8-bit quantization.
911

1012
## Target Requirements
1113

@@ -16,9 +18,12 @@ The XNNPACK delegate is the ExecuTorch solution for CPU execution on mobile CPUs
1618

1719
## Development Requirements
1820

19-
The XNNPACK delegate does not introduce any development system requirements beyond those required by the core ExecuTorch runtime.
21+
The XNNPACK delegate does not introduce any development system requirements beyond those required by
22+
the core ExecuTorch runtime.
2023

21-
## Lowering a Model to XNNPACK
24+
----
25+
26+
## Using the XNNPACK Backend
2227

2328
To target the XNNPACK backend during the export and lowering process, pass an instance of the `XnnpackPartitioner` to `to_edge_transform_and_lower`. The example below demonstrates this process using the MobileNet V2 model from torchvision.
2429

@@ -49,59 +54,64 @@ The XNNPACK partitioner API allows for configuration of the model delegation to
4954
- `per_op_mode`: If true, emit individual delegate calls for every operator. This is an advanced option intended to reduce memory overhead in some contexts at the cost of a small amount of runtime overhead. Defaults to false.
5055
- `verbose`: If true, print additional information during lowering.
5156

52-
### Quantization
57+
### Testing the Model
5358

54-
The XNNPACK delegate can also be used as a backend to execute symmetrically quantized models. To quantize a PyTorch model for the XNNPACK backend, use the `XNNPACKQuantizer`. `Quantizers` are backend specific, which means the `XNNPACKQuantizer` is configured to quantize models to leverage the quantized operators offered by the XNNPACK Library.
59+
After generating the XNNPACK-delegated .pte, the model can be tested from Python using the ExecuTorch runtime python bindings. This can be used to sanity check the model and evaluate numerical accuracy. See [Testing the Model](using-executorch-export.md#testing-the-model) for more information.
5560

56-
### Configuring the XNNPACKQuantizer
61+
----
5762

58-
```python
59-
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import (
60-
XNNPACKQuantizer,
61-
get_symmetric_quantization_config,
62-
)
63-
quantizer = XNNPACKQuantizer()
64-
quantizer.set_global(get_symmetric_quantization_config())
65-
```
66-
Here, the `XNNPACKQuantizer` is configured for symmetric quantization, indicating that the quantized zero point is set to zero with `qmin = -127` and `qmax = 127`. `get_symmetric_quantization_config()` can be configured with the following arguments:
67-
* `is_per_channel`
68-
* Weights are quantized across channels
69-
* `is_qat`
70-
* Quantize aware training
71-
* `is_dynamic`
72-
* Dynamic quantization
63+
## Quantization
7364

74-
```python
75-
quantizer.set_global(quantization_config)
76-
.set_object_type(torch.nn.Conv2d, quantization_config) # can configure by module type
77-
.set_object_type(torch.nn.functional.linear, quantization_config) # or torch functional op typea
78-
.set_module_name("foo.bar", quantization_config) # or by module fully qualified name
79-
```
65+
The XNNPACK delegate can also be used as a backend to execute symmetrically quantized models. To quantize a PyTorch model for the XNNPACK backend, use the `XNNPACKQuantizer`. `Quantizers` are backend specific, which means the `XNNPACKQuantizer` is configured to quantize models to leverage the quantized operators offered by the XNNPACK Library.
66+
67+
### Supported Quantization Schemes
68+
The XNNPACK delegate supports the following quantization schemes:
69+
- 8-bit symmetric weights with 8-bit asymmetric activations (via the PT2E quantization flow).
70+
- Supports both static and dynamic activations.
71+
- Supports per-channel and per-tensor schemes.
72+
- Supports linear, convolution, add, mul, cat, and adaptive avg pool 2d operators.
73+
74+
Weight-only quantization is not currently supported on XNNPACK.
75+
76+
### 8-bit Quantization using the PT2E Flow
77+
78+
To perform 8-bit quantization with the PT2E flow, perform the following steps prior to exporting the model:
79+
80+
1) Create an instance of the `XnnpackQuantizer` class. Set quantization parameters.
81+
2) Use `torch.export.export_for_training` to prepare for quantization.
82+
3) Call `prepare_pt2e` to prepare the model for quantization.
83+
4) For static quantization, run the prepared model with representative samples to calibrate the quantizated tensor activation ranges.
84+
5) Call `convert_pt2e` to quantize the model.
85+
6) Export and lower the model using the standard flow.
86+
87+
The output of `convert_pt2e` is a PyTorch model which can be exported and lowered using the normal flow. As it is a regular PyTorch model, it can also be used to evaluate the accuracy of the quantized model using standard PyTorch techniques.
8088

81-
#### Quantizing a model with the XNNPACKQuantizer
82-
After configuring the quantizer, the model can be quantized by via the `prepare_pt2e` and `convert_pt2e` APIs.
8389
```python
84-
from torch.ao.quantization.quantize_pt2e import (
85-
prepare_pt2e,
86-
convert_pt2e,
87-
)
88-
from torch.export import export_for_training
90+
from executorch.backends.xnnpack.quantizer.xnnpack_quantizer import XNNPACKQuantizer
91+
from torch.ao.quantization.quantize_pt2e import convert_pt2e, prepare_pt2e
92+
from torch.ao.quantization.quantizer.xnnpack_quantizer import get_symmetric_quantization_config
8993

90-
exported_model = export_for_training(model_to_quantize, example_inputs).module()
91-
prepared_model = prepare_pt2e(exported_model, quantizer)
94+
qparams = get_symmetric_quantization_config(is_per_channel=True) # (1)
95+
quantizer = XNNPACKQuantizer()
96+
quantizer.set_global(qparams)
9297

93-
for cal_sample in cal_samples: # Replace with representative model inputs
94-
prepared_model(cal_sample) # Calibrate
98+
training_ep = torch.export.export_for_training(model, sample_inputs).module(), # (2)
99+
prepared_model = prepare_pt2e(training_ep, quantizer) # (3)
95100

96-
quantized_model = convert_pt2e(prepared_model)
97-
```
98-
For static, post-training quantization (PTQ), the post-prepare\_pt2e model should be run with a representative set of samples, which are used to determine the quantization parameters.
101+
for cal_sample in [torch.randn(1, 3, 224, 224)]: # Replace with representative model inputs
102+
prepared_model(cal_sample) # (4) Calibrate
99103

100-
After `convert_pt2e`, the model can be exported and lowered using the normal ExecuTorch XNNPACK flow. For more information on PyTorch 2 quantization [here](https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html).
104+
quantized_model = convert_pt2e(prepared_model) # (5)
101105

102-
### Testing the Model
106+
et_program = to_edge_transform_and_lower( # (6)
107+
torch.export.export(quantized_model, sample_inputs),
108+
partitioner=[XnnpackPartitioner()],
109+
).to_executorch()
110+
```
103111

104-
After generating the XNNPACK-delegated .pte, the model can be tested from Python using the ExecuTorch runtime python bindings. This can be used to sanity check the model and evaluate numerical accuracy. See [Testing the Model](using-executorch-export.md#testing-the-model) for more information.
112+
See [PyTorch 2 Export Post Training Quantization](https://pytorch.org/tutorials/prototype/pt2e_quant_ptq.html) for more information.
113+
114+
----
105115

106116
## Runtime Integration
107117

0 commit comments

Comments
 (0)