|
| 1 | +# Arm® VGF Backend |
| 2 | + |
| 3 | +The Arm VGF backend is the ExecuTorch solution for lowering PyTorch models to VGF compatible hardware. |
| 4 | +It leverages the TOSA operator set and the [ML SDK for Vulkan®](https://github.com/arm/ai-ml-sdk-for-vulkan?tab=readme-ov-file) to produce a .PTE file. |
| 5 | +The VGF backend also supports execution from a .PTE file and provides functionality to extract the corresponding VGF file for integration into various applications. |
| 6 | + |
| 7 | +## Features |
| 8 | + |
| 9 | +- Wide operator support for delegating large parts of models to the VGF target. |
| 10 | +- A quantizer that optimizes quantization for the VGF target. |
| 11 | + |
| 12 | +## Target Requirements |
| 13 | +The target system must include ML SDK for Vulkan and a Vulkan driver with Vulkan API >= 1.3. |
| 14 | + |
| 15 | +## Development Requirements |
| 16 | + |
| 17 | +```{tip} |
| 18 | +All requirements can be downloaded using `examples/arm/setup.sh --enable-mlsdk-deps --disable-ethos-u-deps` and added to the path using |
| 19 | +`source examples/arm/ethos-u-scratch/setup_path.sh` |
| 20 | +``` |
| 21 | + |
| 22 | +For the AOT flow, compilation of a model to `.pte` format using the VGF backend, the requirements are: |
| 23 | +- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR. |
| 24 | +- [ML SDK Model Converter](https://github.com/arm/ai-ml-sdk-model-converter) for converting TOSA flatbuffers to VGF files. |
| 25 | + |
| 26 | +And for building and running your application using the generic executor_runner: |
| 27 | +- [Vulkan API](https://www.vulkan.org) should be set up locally for GPU execution support. |
| 28 | +- [ML Emulation Layer for Vulkan](https://github.com/arm/ai-ml-emulation-layer-for-vulkan) for testing on Vulkan API. |
| 29 | + |
| 30 | +## Using the Arm VGF Backend |
| 31 | +The [VGF Minimal Example](https://github.com/pytorch/executorch/blob/main/examples/arm/vgf_minimal_example.ipynb) demonstrates how to lower a module using the VGF backend. |
| 32 | + |
| 33 | +The main configuration point for the lowering is the `VgfCompileSpec` consumed by the partitioner and quantizer. |
| 34 | +The full user-facing API is documented below. |
| 35 | + |
| 36 | +```python |
| 37 | +class VgfCompileSpec(tosa_spec: executorch.backends.arm.tosa.specification.TosaSpecification | str | None = None, compiler_flags: list[str] | None = None) |
| 38 | +``` |
| 39 | +Compile spec for VGF compatible targets. |
| 40 | + |
| 41 | +Attributes: |
| 42 | +- **tosa_spec**: A TosaSpecification, or a string specifying a TosaSpecification. |
| 43 | +- **compiler_flags**: Extra compiler flags for converter_backend. |
| 44 | + |
| 45 | +```python |
| 46 | +def VgfCompileSpec.dump_debug_info(self, debug_mode: executorch.backends.arm.common.arm_compile_spec.ArmCompileSpec.DebugMode | None): |
| 47 | +``` |
| 48 | +Dump debugging information into the intermediates path. |
| 49 | + |
| 50 | +```python |
| 51 | +def VgfCompileSpec.dump_intermediate_artifacts_to(self, output_path: str | None): |
| 52 | +``` |
| 53 | +Sets a path for dumping intermediate results during lowering such as tosa and pte. |
| 54 | + |
| 55 | +```python |
| 56 | +def VgfCompileSpec.get_intermediate_path(self) -> str | None: |
| 57 | +``` |
| 58 | +Returns the path for dumping intermediate results during lowering such as tosa and pte. |
| 59 | + |
| 60 | +```python |
| 61 | +def VgfCompileSpec.get_output_format() -> str: |
| 62 | +``` |
| 63 | +Returns a constant string that is the output format of the class. |
| 64 | + |
| 65 | + |
| 66 | + |
| 67 | +### Partitioner API |
| 68 | +```python |
| 69 | +class VgfPartitioner(compile_spec: executorch.backends.arm.vgf.compile_spec.VgfCompileSpec, additional_checks: Optional[Sequence[torch.fx.passes.operator_support.OperatorSupportBase]] = None) -> None |
| 70 | +``` |
| 71 | +Partitions subgraphs supported by the Arm Vgf backend. |
| 72 | + |
| 73 | +Attributes: |
| 74 | +- **compile_spec**:List of CompileSpec objects for Vgf backend. |
| 75 | +- **additional_checks**: Optional sequence of additional operator support checks. |
| 76 | + |
| 77 | +```python |
| 78 | +def VgfPartitioner.ops_to_not_decompose(self, ep: torch.export.exported_program.ExportedProgram) -> Tuple[List[torch._ops.OpOverload], Optional[Callable[[torch.fx.node.Node], bool]]]: |
| 79 | +``` |
| 80 | +Returns a list of operator names that should not be decomposed. When these ops are |
| 81 | +registered and the `to_backend` is invoked through to_edge_transform_and_lower it will be |
| 82 | +guaranteed that the program that the backend receives will not have any of these ops |
| 83 | +decomposed. |
| 84 | + |
| 85 | +Returns: |
| 86 | +- **List[torch._ops.OpOverload]**: a list of operator names that should not be decomposed. |
| 87 | +- **Optional[Callable[[torch.fx.Node], bool]]]**: an optional callable, acting as a filter, that users can provide |
| 88 | + which will be called for each node in the graph that users can use as a filter for certain |
| 89 | + nodes that should be continued to be decomposed even though the op they correspond to is |
| 90 | + in the list returned by ops_to_not_decompose. |
| 91 | + |
| 92 | +```python |
| 93 | +def VgfPartitioner.partition(self, exported_program: torch.export.exported_program.ExportedProgram) -> executorch.exir.backend.partitioner.PartitionResult: |
| 94 | +``` |
| 95 | +Returns the input exported program with newly created sub-Modules encapsulating |
| 96 | +specific portions of the input "tagged" for delegation. |
| 97 | + |
| 98 | +The specific implementation is free to decide how existing computation in the |
| 99 | +input exported program should be delegated to one or even more than one specific |
| 100 | +backends. |
| 101 | + |
| 102 | +The contract is stringent in that: |
| 103 | +* Each node that is intended to be delegated must be tagged |
| 104 | +* No change in the original input exported program (ExportedProgram) representation can take |
| 105 | +place other than adding sub-Modules for encapsulating existing portions of the |
| 106 | +input exported program and the associated metadata for tagging. |
| 107 | + |
| 108 | +Args: |
| 109 | +- **exported_program**: An ExportedProgram in Edge dialect to be partitioned for backend delegation. |
| 110 | + |
| 111 | +Returns: |
| 112 | +- **PartitionResult**: includes the tagged graph and the delegation spec to indicate what backend_id and compile_spec is used for each node and the tag created by the backend developers. |
| 113 | + |
| 114 | + |
| 115 | + |
| 116 | +### Quantizer |
| 117 | +The VGF quantizer supports [Post Training Quantization (PT2E)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html) |
| 118 | +and [Quantization-Aware Training (QAT)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_qat.html) quantization. |
| 119 | + |
| 120 | +Currently the symmetric `int8` config defined by `executorch.backends.arm.quantizer.arm_quantizer.get_symmetric_quantization_config` is |
| 121 | +the main config available to use with the VGF quantizer. |
| 122 | + |
| 123 | +```python |
| 124 | +class VgfQuantizer(compile_spec: 'VgfCompileSpec') -> 'None' |
| 125 | +``` |
| 126 | +Quantizer supported by the Arm Vgf backend. |
| 127 | + |
| 128 | +Attributes: |
| 129 | +- **compile_spec**: VgfCompileSpec, specifies the compilation configuration. |
| 130 | + |
| 131 | +```python |
| 132 | +def VgfQuantizer.set_global(self, quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer': |
| 133 | +``` |
| 134 | +Set quantization_config for submodules that are not already annotated by name or type filters. |
| 135 | + |
| 136 | +Args: |
| 137 | +- **quantization_config**: Specifies the quantization scheme for the weights and activations |
| 138 | + |
| 139 | +```python |
| 140 | +def VgfQuantizer.set_io(self, quantization_config): |
| 141 | +``` |
| 142 | +Set quantization_config for input and output nodes. |
| 143 | + |
| 144 | +Args: |
| 145 | +- **quantization_config**: Specifies the quantization scheme for the weights and activations |
| 146 | + |
| 147 | +```python |
| 148 | +def VgfQuantizer.set_module_name(self, module_name: 'str', quantization_config: 'Optional[QuantizationConfig]') -> 'TOSAQuantizer': |
| 149 | +``` |
| 150 | +Set quantization_config for a submodule with name: `module_name`, for example: |
| 151 | +quantizer.set_module_name("blocks.sub"), it will quantize all supported operator/operator |
| 152 | +patterns in the submodule with this module name with the given `quantization_config` |
| 153 | + |
| 154 | +Args: |
| 155 | +- **module_name**: Name of the module to which the quantization_config is set. |
| 156 | +- **quantization_config**: Specifies the quantization scheme for the weights and activations. |
| 157 | + |
| 158 | +Returns: |
| 159 | +- **TOSAQuantizer**: The quantizer instance with the updated module name configuration |
| 160 | + |
| 161 | +```python |
| 162 | +def VgfQuantizer.set_module_type(self, module_type: 'Callable', quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer': |
| 163 | +``` |
| 164 | +Set quantization_config for a submodule with type: `module_type`, for example: |
| 165 | +quantizer.set_module_name(Sub) or quantizer.set_module_name(nn.Linear), it will quantize all supported operator/operator |
| 166 | +patterns in the submodule with this module type with the given `quantization_config` |
| 167 | + |
| 168 | +Args: |
| 169 | +- **module_type**: Type of module to which the quantization_config is set. |
| 170 | +- **quantization_config**: Specifies the quantization scheme for the weights and activations. |
| 171 | + |
| 172 | +Returns: |
| 173 | +- **TOSAQuantizer**: The quantizer instance with the updated module type configuration |
| 174 | + |
| 175 | +```python |
| 176 | +def VgfQuantizer.transform_for_annotation(self, model: 'GraphModule') -> 'GraphModule': |
| 177 | +``` |
| 178 | +An initial pass for transforming the graph to prepare it for annotation. |
| 179 | +Currently transforms scalar values to tensor attributes. |
| 180 | + |
| 181 | +Args: |
| 182 | +- **model**: Module that is transformed. |
| 183 | + |
| 184 | +Returns: |
| 185 | + The transformed model. |
| 186 | + |
| 187 | + |
| 188 | +### Supported Quantization Schemes |
| 189 | +The quantization schemes supported by the VGF Backend are: |
| 190 | +- 8-bit symmetric weights with 8-bit asymmetric activations (via the PT2E quantization flow). |
| 191 | + - Supports both static and dynamic activations |
| 192 | + - Supports per-channel and per-tensor schemes |
| 193 | + |
| 194 | +Weight-only quantization is not currently supported on VGF |
| 195 | + |
| 196 | +## Runtime Integration |
| 197 | + |
| 198 | +The VGF backend can use the default ExecuTorch runner. The steps required for building and running it are explained in the previously mentioned [VGF Backend Tutorial](https://docs.pytorch.org/executorch/stable/tutorial-arm-ethos-u.html). |
| 199 | +The example application is recommended to use for testing basic functionality of your lowered models, as well as a starting point for developing runtime integrations for your own targets. |
| 200 | + |
| 201 | +### VGF Adapter for Model Explorer |
| 202 | + |
| 203 | +The [VGF Adapter for Model Explorer](https://github.com/arm/vgf-adapter-model-explorer) enables visualization of |
| 204 | +VGF files and can be useful for debugging. |
0 commit comments