|
| 1 | +# NXP eIQ Neutron Quantization |
| 2 | + |
| 3 | +The eIQ Neutron NPU requires the operators delegated to be quantized. To quantize the PyTorch model for the Neutron Backend, use the `NeutronQuantizer` from `backends/nxp/quantizer/neutron_quantizer.py`. |
| 4 | +The `NeutronQuantizer` is configured to quantize the model with quantization scheme supported by the eIQ Neutron NPU. |
| 5 | + |
| 6 | +### Supported Quantization Schemes |
| 7 | + |
| 8 | +The Neutron delegate supports the following quantization schemes: |
| 9 | + |
| 10 | +- Static quantization with 8-bit symmetric weights and 8-bit asymmetric activations (via the PT2E quantization flow), per-tensor granularity. |
| 11 | + - Following operators are supported at this moment: |
| 12 | + - `aten.abs.default` |
| 13 | + - `aten.adaptive_avg_pool2d.default` |
| 14 | + - `aten.addmm.default` |
| 15 | + - `aten.add.Tensor` |
| 16 | + - `aten.avg_pool2d.default` |
| 17 | + - `aten.cat.default` |
| 18 | + - `aten.conv1d.default` |
| 19 | + - `aten.conv2d.default` |
| 20 | + - `aten.dropout.default` |
| 21 | + - `aten.flatten.using_ints` |
| 22 | + - `aten.hardtanh.default` |
| 23 | + - `aten.hardtanh_.default` |
| 24 | + - `aten.linear.default` |
| 25 | + - `aten.max_pool2d.default` |
| 26 | + - `aten.mean.dim` |
| 27 | + - `aten.pad.default` |
| 28 | + - `aten.permute.default` |
| 29 | + - `aten.relu.default` and `aten.relu_.default` |
| 30 | + - `aten.reshape.default` |
| 31 | + - `aten.view.default` |
| 32 | + - `aten.softmax.int` |
| 33 | + - `aten.tanh.default`, `aten.tanh_.default` |
| 34 | + - `aten.sigmoid.default` |
| 35 | + |
| 36 | +### Static 8-bit Quantization using the PT2E Flow |
| 37 | + |
| 38 | +To perform 8-bit quantization with the PT2E flow, perform the following steps prior to exporting the model to edge: |
| 39 | + |
| 40 | +1) Create an instance of the `NeutronQuantizer` class. |
| 41 | +2) Use `torch.export.export` to export the model to ATen Dialect. |
| 42 | +3) Call `prepare_pt2e` with the instance of the `NeutronQuantizer` to annotate the model with observers for quantization. |
| 43 | +4) As static quantization is required, run the prepared model with representative samples to calibrate the quantized tensor activation ranges. |
| 44 | +5) Call `convert_pt2e` to quantize the model. |
| 45 | +6) Export and lower the model using the standard flow. |
| 46 | + |
| 47 | +The output of `convert_pt2e` is a PyTorch model which can be exported and lowered using the normal flow. As it is a regular PyTorch model, it can also be used to evaluate the accuracy of the quantized model using standard PyTorch techniques. |
| 48 | + |
| 49 | +```python |
| 50 | +import torch |
| 51 | +import torchvision.models as models |
| 52 | +from torchvision.models.mobilenetv2 import MobileNet_V2_Weights |
| 53 | +from executorch.backends.nxp.quantizer.neutron_quantizer import NeutronQuantizer |
| 54 | +from executorch.backends.nxp.neutron_partitioner import NeutronPartitioner |
| 55 | +from executorch.backends.nxp.nxp_backend import generate_neutron_compile_spec |
| 56 | +from executorch.exir import to_edge_transform_and_lower |
| 57 | +from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e |
| 58 | + |
| 59 | +model = models.mobilenetv2.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).eval() |
| 60 | +sample_inputs = (torch.randn(1, 3, 224, 224), ) |
| 61 | + |
| 62 | +quantizer = NeutronQuantizer() # (1) |
| 63 | + |
| 64 | +training_ep = torch.export.export(model, sample_inputs).module() # (2) |
| 65 | +prepared_model = prepare_pt2e(training_ep, quantizer) # (3) |
| 66 | + |
| 67 | +for cal_sample in [torch.randn(1, 3, 224, 224)]: # Replace with representative model inputs |
| 68 | + prepared_model(cal_sample) # (4) Calibrate |
| 69 | + |
| 70 | +quantized_model = convert_pt2e(prepared_model) # (5) |
| 71 | + |
| 72 | +compile_spec = generate_neutron_compile_spec( |
| 73 | + "imxrt700", |
| 74 | + operators_not_to_delegate=None, |
| 75 | + neutron_converter_flavor="SDK_25_06", |
| 76 | +) |
| 77 | + |
| 78 | +et_program = to_edge_transform_and_lower( # (6) |
| 79 | + torch.export.export(quantized_model, sample_inputs), |
| 80 | + partitioner=[NeutronPartitioner(compile_spec=compile_spec)], |
| 81 | +).to_executorch() |
| 82 | +``` |
| 83 | + |
| 84 | +See [PyTorch 2 Export Post Training Quantization](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) for more information. |
0 commit comments