Skip to content

Commit 50ac8ea

Browse files
authored
Arm Backend: Create backends-arm-vgf.md (#14261)
Create the documentation for the VGF backend. This includes the change introduced in #14191 . cc @digantdesai @freddan80 @per @zingo @oscarandersson8218 --------- Signed-off-by: Agrima Khare <[email protected]>
1 parent e6d8fb9 commit 50ac8ea

File tree

3 files changed

+207
-1
lines changed

3 files changed

+207
-1
lines changed

docs/source/backends-arm-vgf.md

Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
# Arm&reg; VGF Backend
2+
3+
The Arm VGF backend is the ExecuTorch solution for lowering PyTorch models to VGF compatible hardware.
4+
It leverages the TOSA operator set and the [ML SDK for Vulkan&reg;](https://github.com/arm/ai-ml-sdk-for-vulkan?tab=readme-ov-file) to produce a .PTE file.
5+
The VGF backend also supports execution from a .PTE file and provides functionality to extract the corresponding VGF file for integration into various applications.
6+
7+
## Features
8+
9+
- Wide operator support for delegating large parts of models to the VGF target.
10+
- A quantizer that optimizes quantization for the VGF target.
11+
12+
## Target Requirements
13+
The target system must include ML SDK for Vulkan and a Vulkan driver with Vulkan API >= 1.3.
14+
15+
## Development Requirements
16+
17+
```{tip}
18+
All requirements can be downloaded using `examples/arm/setup.sh --enable-mlsdk-deps --disable-ethos-u-deps` and added to the path using
19+
`source examples/arm/ethos-u-scratch/setup_path.sh`
20+
```
21+
22+
For the AOT flow, compilation of a model to `.pte` format using the VGF backend, the requirements are:
23+
- [TOSA Serialization Library](https://www.mlplatform.org/tosa/software.html) for serializing the Exir IR graph into TOSA IR.
24+
- [ML SDK Model Converter](https://github.com/arm/ai-ml-sdk-model-converter) for converting TOSA flatbuffers to VGF files.
25+
26+
And for building and running your application using the generic executor_runner:
27+
- [Vulkan API](https://www.vulkan.org) should be set up locally for GPU execution support.
28+
- [ML Emulation Layer for Vulkan](https://github.com/arm/ai-ml-emulation-layer-for-vulkan) for testing on Vulkan API.
29+
30+
## Using the Arm VGF Backend
31+
The [VGF Minimal Example](https://github.com/pytorch/executorch/blob/main/examples/arm/vgf_minimal_example.ipynb) demonstrates how to lower a module using the VGF backend.
32+
33+
The main configuration point for the lowering is the `VgfCompileSpec` consumed by the partitioner and quantizer.
34+
The full user-facing API is documented below.
35+
36+
```python
37+
class VgfCompileSpec(tosa_spec: executorch.backends.arm.tosa.specification.TosaSpecification | str | None = None, compiler_flags: list[str] | None = None)
38+
```
39+
Compile spec for VGF compatible targets.
40+
41+
Attributes:
42+
- **tosa_spec**: A TosaSpecification, or a string specifying a TosaSpecification.
43+
- **compiler_flags**: Extra compiler flags for converter_backend.
44+
45+
```python
46+
def VgfCompileSpec.dump_debug_info(self, debug_mode: executorch.backends.arm.common.arm_compile_spec.ArmCompileSpec.DebugMode | None):
47+
```
48+
Dump debugging information into the intermediates path.
49+
50+
```python
51+
def VgfCompileSpec.dump_intermediate_artifacts_to(self, output_path: str | None):
52+
```
53+
Sets a path for dumping intermediate results during lowering such as tosa and pte.
54+
55+
```python
56+
def VgfCompileSpec.get_intermediate_path(self) -> str | None:
57+
```
58+
Returns the path for dumping intermediate results during lowering such as tosa and pte.
59+
60+
```python
61+
def VgfCompileSpec.get_output_format() -> str:
62+
```
63+
Returns a constant string that is the output format of the class.
64+
65+
66+
67+
### Partitioner API
68+
```python
69+
class VgfPartitioner(compile_spec: executorch.backends.arm.vgf.compile_spec.VgfCompileSpec, additional_checks: Optional[Sequence[torch.fx.passes.operator_support.OperatorSupportBase]] = None) -> None
70+
```
71+
Partitions subgraphs supported by the Arm Vgf backend.
72+
73+
Attributes:
74+
- **compile_spec**:List of CompileSpec objects for Vgf backend.
75+
- **additional_checks**: Optional sequence of additional operator support checks.
76+
77+
```python
78+
def VgfPartitioner.ops_to_not_decompose(self, ep: torch.export.exported_program.ExportedProgram) -> Tuple[List[torch._ops.OpOverload], Optional[Callable[[torch.fx.node.Node], bool]]]:
79+
```
80+
Returns a list of operator names that should not be decomposed. When these ops are
81+
registered and the `to_backend` is invoked through to_edge_transform_and_lower it will be
82+
guaranteed that the program that the backend receives will not have any of these ops
83+
decomposed.
84+
85+
Returns:
86+
- **List[torch._ops.OpOverload]**: a list of operator names that should not be decomposed.
87+
- **Optional[Callable[[torch.fx.Node], bool]]]**: an optional callable, acting as a filter, that users can provide
88+
which will be called for each node in the graph that users can use as a filter for certain
89+
nodes that should be continued to be decomposed even though the op they correspond to is
90+
in the list returned by ops_to_not_decompose.
91+
92+
```python
93+
def VgfPartitioner.partition(self, exported_program: torch.export.exported_program.ExportedProgram) -> executorch.exir.backend.partitioner.PartitionResult:
94+
```
95+
Returns the input exported program with newly created sub-Modules encapsulating
96+
specific portions of the input "tagged" for delegation.
97+
98+
The specific implementation is free to decide how existing computation in the
99+
input exported program should be delegated to one or even more than one specific
100+
backends.
101+
102+
The contract is stringent in that:
103+
* Each node that is intended to be delegated must be tagged
104+
* No change in the original input exported program (ExportedProgram) representation can take
105+
place other than adding sub-Modules for encapsulating existing portions of the
106+
input exported program and the associated metadata for tagging.
107+
108+
Args:
109+
- **exported_program**: An ExportedProgram in Edge dialect to be partitioned for backend delegation.
110+
111+
Returns:
112+
- **PartitionResult**: includes the tagged graph and the delegation spec to indicate what backend_id and compile_spec is used for each node and the tag created by the backend developers.
113+
114+
115+
116+
### Quantizer
117+
The VGF quantizer supports [Post Training Quantization (PT2E)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html)
118+
and [Quantization-Aware Training (QAT)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_qat.html) quantization.
119+
120+
Currently the symmetric `int8` config defined by `executorch.backends.arm.quantizer.arm_quantizer.get_symmetric_quantization_config` is
121+
the main config available to use with the VGF quantizer.
122+
123+
```python
124+
class VgfQuantizer(compile_spec: 'VgfCompileSpec') -> 'None'
125+
```
126+
Quantizer supported by the Arm Vgf backend.
127+
128+
Attributes:
129+
- **compile_spec**: VgfCompileSpec, specifies the compilation configuration.
130+
131+
```python
132+
def VgfQuantizer.set_global(self, quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer':
133+
```
134+
Set quantization_config for submodules that are not already annotated by name or type filters.
135+
136+
Args:
137+
- **quantization_config**: Specifies the quantization scheme for the weights and activations
138+
139+
```python
140+
def VgfQuantizer.set_io(self, quantization_config):
141+
```
142+
Set quantization_config for input and output nodes.
143+
144+
Args:
145+
- **quantization_config**: Specifies the quantization scheme for the weights and activations
146+
147+
```python
148+
def VgfQuantizer.set_module_name(self, module_name: 'str', quantization_config: 'Optional[QuantizationConfig]') -> 'TOSAQuantizer':
149+
```
150+
Set quantization_config for a submodule with name: `module_name`, for example:
151+
quantizer.set_module_name("blocks.sub"), it will quantize all supported operator/operator
152+
patterns in the submodule with this module name with the given `quantization_config`
153+
154+
Args:
155+
- **module_name**: Name of the module to which the quantization_config is set.
156+
- **quantization_config**: Specifies the quantization scheme for the weights and activations.
157+
158+
Returns:
159+
- **TOSAQuantizer**: The quantizer instance with the updated module name configuration
160+
161+
```python
162+
def VgfQuantizer.set_module_type(self, module_type: 'Callable', quantization_config: 'QuantizationConfig') -> 'TOSAQuantizer':
163+
```
164+
Set quantization_config for a submodule with type: `module_type`, for example:
165+
quantizer.set_module_name(Sub) or quantizer.set_module_name(nn.Linear), it will quantize all supported operator/operator
166+
patterns in the submodule with this module type with the given `quantization_config`
167+
168+
Args:
169+
- **module_type**: Type of module to which the quantization_config is set.
170+
- **quantization_config**: Specifies the quantization scheme for the weights and activations.
171+
172+
Returns:
173+
- **TOSAQuantizer**: The quantizer instance with the updated module type configuration
174+
175+
```python
176+
def VgfQuantizer.transform_for_annotation(self, model: 'GraphModule') -> 'GraphModule':
177+
```
178+
An initial pass for transforming the graph to prepare it for annotation.
179+
Currently transforms scalar values to tensor attributes.
180+
181+
Args:
182+
- **model**: Module that is transformed.
183+
184+
Returns:
185+
The transformed model.
186+
187+
188+
### Supported Quantization Schemes
189+
The quantization schemes supported by the VGF Backend are:
190+
- 8-bit symmetric weights with 8-bit asymmetric activations (via the PT2E quantization flow).
191+
- Supports both static and dynamic activations
192+
- Supports per-channel and per-tensor schemes
193+
194+
Weight-only quantization is not currently supported on VGF
195+
196+
## Runtime Integration
197+
198+
The VGF backend can use the default ExecuTorch runner. The steps required for building and running it are explained in the previously mentioned [VGF Backend Tutorial](https://docs.pytorch.org/executorch/stable/tutorial-arm-ethos-u.html).
199+
The example application is recommended to use for testing basic functionality of your lowered models, as well as a starting point for developing runtime integrations for your own targets.
200+
201+
### VGF Adapter for Model Explorer
202+
203+
The [VGF Adapter for Model Explorer](https://github.com/arm/vgf-adapter-model-explorer) enables visualization of
204+
VGF files and can be useful for debugging.

docs/source/backends-overview.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,5 +16,6 @@ Commonly used hardware backends are listed below. For mobile, consider using XNN
1616
- [Vulkan (Android GPU)](backends-vulkan.md)
1717
- [Qualcomm NPU](backends-qualcomm.md)
1818
- [MediaTek NPU](backends-mediatek.md)
19-
- [Arm Ethos-U NPU](backends-arm-ethos-u.md)
19+
- [ARM Ethos-U NPU](backends-arm-ethos-u.md)
20+
- [ARM VGF](backends-arm-vgf.md)
2021
- [Cadence DSP](backends-cadence.md)

docs/source/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ ExecuTorch provides support for:
5151
- [MPS](backends-mps)
5252
- [Vulkan](backends-vulkan)
5353
- [ARM Ethos-U](backends-arm-ethos-u)
54+
- [ARM VGF](backends-arm-vgf)
5455
- [Qualcomm](backends-qualcomm)
5556
- [MediaTek](backends-mediatek)
5657
- [Cadence](backends-cadence)

0 commit comments

Comments
 (0)