-
Notifications
You must be signed in to change notification settings - Fork 748
Arm backend: Document Ethos-U memory modes and add Ethos-U porting guide #14144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm backend: Document Ethos-U memory modes and add Ethos-U porting guide #14144
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14144
Note: Links to docs will display an error until the docs builds have been completed. ❌ 18 New Failures, 1 Cancelled Job, 24 Unrelated FailuresAs of commit f97617c with merge base a89b858 ( NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
digantdesai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't read too carefully, but love the diagrams :)
check the rendered docs on the PR before merging for formatting related issues mainly.
|
@GeorgeARM there seem to be a link problem see https://github.com/pytorch/executorch/actions/runs/17612710981/job/50038191324?pr=14144 Running command: docker exec -t 9a40a86ad68e5a1276c257b270a356fd5e3207fd65f96ad6cec20bfeabc4c560 /exec docs/source/backends-arm-ethos-u.md: examples/arm/ethos-u-porting-guide.md:
|
|
You seem to be able to reproduce and run on your maching if you run |
|
Thanks @digantdesai @zingo . The documentation renders correctly in https://docs-preview.pytorch.org/pytorch/executorch/14144/backends-arm-ethos-u.html At the end of the md document, i am referring an absolute path that doesn't exist yet. When the PR gets merged, the https://github.com/pytorch/executorch/blob/main/examples/arm/ethos-u-porting-guide.md webpage will be created and I assume the hyperlink will work correctly. Is this definitely true ? If yes, then I believe that is the reason for the failing CI test also. |
Memory modes: The Shared_Sram, Sram_Only and Dedicated_Sram memory modes are specified in the compile spec and are tightly coupled with how the ethos-U scratch buffer and NN should be placed in the embedded application. Different memory modes profoundly impact the performance and memory footprint of the application and it is important to use the NPU in the most suitable memory mode for optimal performance. Porting guide: A document explaining the key steps to port a new hardware target with an Ethos-U NPU to the Ethos-U backend in ExecuTorch Change-Id: I925b8fb5dfb536f5af663cebe000fbb755955fcf
4a85695 to
f97617c
Compare
| - One interface for **higher-latency, lower-bandwidth memory** | ||
| Typically external (off-chip) memory such as **Flash** or **DRAM**. | ||
|
|
||
| On all Ethos-U NPUs(Ethos-U55, Ethos-U65, Ethos-U85), the low-latency interface is usually the SRAM of the SoC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing space after "NPUs"
|
|
||
| On all Ethos-U NPUs(Ethos-U55, Ethos-U65, Ethos-U85), the low-latency interface is usually the SRAM of the SoC. | ||
| The external memory type depends on the SoC: | ||
| - On a low-power microcontorller, the external memory is usually Flash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
microcontroller
| - The dedicated SRAM acts as a software managed cache, improving performance by pre-fetching frequently accessed tensors to the on-chip memory. | ||
| - Available on Ethos-U65 and Ethos-U85. | ||
| - Limitations: | ||
| - The SRAM space must be dedicated exculisely to the Ethos-U(the host processor should not access it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: space after Ethos-U
| Therefore, from an application standpoint, you need to ensure you have at least 2467.27 KiB of SRAM on the SoC to run this model. The Ethos-U compiler provides a scheduling algorithm allowing to | ||
| lower the peak SRAM usage within reasonable limits, you need to add the `--optimise Size` or `--arena-cache-size` CLI options for to the compile spec. You can read more about the options of the | ||
| Ethos-U compiler in the documentation [here](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md#optimise). If the peak SRAM usage remains too high in | ||
| Shared Sram memory mode, you would need to us the Dedicated Sram mode in order to store the Neural Network and the Ethos-U scratch buffer in the external memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all-caps SRAM?
| The main advantage of the Dedicated_Sram memory mode is that you can run large models and still benefit from the low-latency/high-bandwidth of the SRAM, used as a cache. | ||
| It is important to highlight that when you specify a memory mode in the compile spec, in the runtime, the user is expected to place the scratch buffer and NN in the correct memory location. | ||
| In other words, when you specify for ex. Shared Sram memory mode, the runtime application logic should place the ethos-U scratch buffer in the on-chip memory and the NN in the external memory for optimal performance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SRAM, Ethos-U
| Core-platform includes a [tutorial](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-core-platform/-/blob/main/PORTING.md?ref_type=heads) on porting a new target. If you port your target to | ||
| core-platform, you can then easily reuse it in the ExecuTorch runtime. | ||
|
|
||
| Also, as explained in the comments in `backends/arm/scripts/corstone_utils.cmake`, note that REGIONCFG register of the Ethos-U controls the memory(on-chip or external memory) used by the NPU to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space between memory and (
| ## Corstone linker scripts | ||
|
|
||
| The linker scripts point to the linker where to place various objects in memory when the application is loaded onto the target. | ||
| In the `arm_executor_runner.cpp` application, we reuse the linker scripts from the core-platform project. Note that the Global Offset Table(.got symbols) needs to be 16-byte aligned. The linker scripts are highly specific to the memory map of system. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing space between Table and (
maybe s/the memory map of system/the memory map of the system/ ?
|
|
||
| For example, if you generate a pte file with compile specification for Shared Sram, the scratch buffer should be placed in the SRAM and the NN in the external memory in the runtime application code. You can see we are following | ||
| this approach in the `examples/arm/executor_runner/arm_executor_runner.cpp` example application. In the linker scripts for the application(`examples/arm/executor_runner/Corstone-320.ld` and | ||
| `examples/arm/executor_runner/Corstone-300.ld`) we check the value of `ETHOSU_ARENA` to determine whether the ethos-u scratch buffer is placed in the on-chip memory or in the external memory. In this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capitalize Ethos-U?
| and the `.bss.tensor_arena` section is placed in the correct location in the memory map thanks to | ||
| the `ETHOSU_ARENA` parameter. | ||
|
|
||
| There is a tight coupling between the memory mode for the Ethos-U and the placement of the ethos-u scratch buffer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capitalize ethos-u?
| from executorch.exir.passes.quantize_io_pass import QuantizeInputs | ||
| from executorch.exir.passes.quantize_io_pass import QuantizeOutputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just from executorch.exir.passes.quantize_io_pass import QuantizeInputs, QuantizeOutputs?
…ide (pytorch#14144) Memory modes: The Shared_Sram, Sram_Only and Dedicated_Sram memory modes are specified in the compile spec and are tightly coupled with how the ethos-U scratch buffer and NN should be placed in the embedded application. Different memory modes profoundly impact the performance and memory footprint of the application and it is important to use the NPU in the most suitable memory mode for optimal performance. Porting guide: A document explaining the key steps to port a new hardware target with an Ethos-U NPU to the Ethos-U backend in ExecuTorch
Memory modes: The Shared_Sram, Sram_Only and Dedicated_Sram memory modes are specified in the compile spec and are tightly coupled with how the ethos-U scratch buffer and NN should be placed in the embedded application. Different memory modes profoundly impact the performance and memory footprint of the application and it is important to use the NPU in the most suitable memory mode for optimal performance.
Porting guide: A document explaining the key steps to port a new hardware target with an Ethos-U NPU to the Ethos-U backend in ExecuTorch
cc @digantdesai @freddan80 @per @zingo @oscarandersson8218