Skip to content

Conversation

@gggekov
Copy link
Collaborator

@gggekov gggekov commented Sep 10, 2025

Memory modes: The Shared_Sram, Sram_Only and Dedicated_Sram memory modes are specified in the compile spec and are tightly coupled with how the ethos-U scratch buffer and NN should be placed in the embedded application. Different memory modes profoundly impact the performance and memory footprint of the application and it is important to use the NPU in the most suitable memory mode for optimal performance.

Porting guide: A document explaining the key steps to port a new hardware target with an Ethos-U NPU to the Ethos-U backend in ExecuTorch

cc @digantdesai @freddan80 @per @zingo @oscarandersson8218

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 10, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14144

Note: Links to docs will display an error until the docs builds have been completed.

❌ 18 New Failures, 1 Cancelled Job, 24 Unrelated Failures

As of commit f97617c with merge base a89b858 (image):

NEW FAILURES - The following jobs have failed:

  • pull / test-moshi-linux / linux-job (gh)
    RuntimeError: Command docker exec -t 516e54b991046d629ec72bf0b1cd9dee9227ff23dc7cc5e556f03446c5444516 /exec failed with exit code 1
  • pull / unittest / macos / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • pull / unittest-editable / macos / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-coreml (dl3) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-coreml (efficient_sam) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-coreml (emformer_join) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-coreml (ic4) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-coreml (resnet50) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-coreml (vit) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-coreml (w2l) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-cpu (emformer_join, xnnpack-quantization-delegation) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-cpu (llama, portable) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-cpu (llama3_2_vision_encoder, portable) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-cpu (vit, xnnpack-quantization-delegation) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-models-macos-mps / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-selective-build-macos (cmake) / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / test-static-llama-ane / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?
  • trunk / unittest-release / macos / macos-job (gh)
    Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/Users/ec2-user/runner/_work/executorch/executorch/test-infra/.github/actions/check-disk-space'. Did you forget to run actions/checkout before running your local action?

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 10, 2025
@gggekov gggekov added partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm ciflow/trunk release notes: none Do not include this in the release notes labels Sep 10, 2025
@zingo zingo added this to the 1.0.0 milestone Sep 10, 2025
@zingo zingo added release notes: arm Changes to the ARM backend delegate and removed release notes: none Do not include this in the release notes labels Sep 10, 2025
Copy link
Contributor

@digantdesai digantdesai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't read too carefully, but love the diagrams :)
check the rendered docs on the PR before merging for formatting related issues mainly.

@zingo
Copy link
Collaborator

zingo commented Sep 10, 2025

@GeorgeARM there seem to be a link problem see https://github.com/pytorch/executorch/actions/runs/17612710981/job/50038191324?pr=14144
This job need to be working for us to be able to merge.

Running command: docker exec -t 9a40a86ad68e5a1276c257b270a356fd5e3207fd65f96ad6cec20bfeabc4c560 /exec
++ '[' pull_request = pull_request ']'
++ echo 4d6209b 4a85695

docs/source/backends-arm-ethos-u.md:
FAIL examples/arm/ethos-u-porting-guide.md

examples/arm/ethos-u-porting-guide.md:
OK ../../docs/source/backends-arm-ethos-u.md

  • echo

  • echo 'Xref lint failed.'
    Xref lint failed.

  • echo 'If this is a transient outage, you can bypass it by adding the skip-xref-lint label to your PR.'
    If this is a transient outage, you can bypass it by adding the skip-xref-lint label to your PR.

  • echo 'Or add @lint-ignore somewhere on the same line as the reference you want to skip checking.'
    Or add @lint-ignore somewhere on the same line as the reference you want to skip checking.

@zingo
Copy link
Collaborator

zingo commented Sep 10, 2025

You seem to be able to reproduce and run on your maching if you run
./scripts/lint_xrefs.sh
on you patch (it will check all files)

@gggekov
Copy link
Collaborator Author

gggekov commented Sep 10, 2025

Thanks @digantdesai @zingo . The documentation renders correctly in https://docs-preview.pytorch.org/pytorch/executorch/14144/backends-arm-ethos-u.html
with one exception.

At the end of the md document, i am referring an absolute path that doesn't exist yet.
...the [Ethos-U porting guide](https://github.com/pytorch/executorch/blob/main/examples/arm/ethos-u-porting-guide.md).

When the PR gets merged, the https://github.com/pytorch/executorch/blob/main/examples/arm/ethos-u-porting-guide.md webpage will be created and I assume the hyperlink will work correctly. Is this definitely true ? If yes, then I believe that is the reason for the failing CI test also.

Memory modes: The Shared_Sram, Sram_Only and Dedicated_Sram memory modes
are specified in the compile spec and are tightly coupled with
how the ethos-U scratch buffer and NN should be placed in the
embedded application. Different memory modes profoundly impact
the performance and memory footprint of the application and it
is important to use the NPU in the most suitable memory mode
for optimal performance.

Porting guide: A document explaining the key steps to port a new
hardware target with an Ethos-U NPU to the Ethos-U backend in ExecuTorch

Change-Id: I925b8fb5dfb536f5af663cebe000fbb755955fcf
@gggekov gggekov force-pushed the documentation_memory_modes_porting_guide branch from 4a85695 to f97617c Compare September 10, 2025 17:30
@zingo zingo merged commit 6ed10e5 into pytorch:main Sep 11, 2025
376 of 455 checks passed
- One interface for **higher-latency, lower-bandwidth memory**
Typically external (off-chip) memory such as **Flash** or **DRAM**.

On all Ethos-U NPUs(Ethos-U55, Ethos-U65, Ethos-U85), the low-latency interface is usually the SRAM of the SoC.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing space after "NPUs"


On all Ethos-U NPUs(Ethos-U55, Ethos-U65, Ethos-U85), the low-latency interface is usually the SRAM of the SoC.
The external memory type depends on the SoC:
- On a low-power microcontorller, the external memory is usually Flash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

microcontroller

- The dedicated SRAM acts as a software managed cache, improving performance by pre-fetching frequently accessed tensors to the on-chip memory.
- Available on Ethos-U65 and Ethos-U85.
- Limitations:
- The SRAM space must be dedicated exculisely to the Ethos-U(the host processor should not access it).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space after Ethos-U

Therefore, from an application standpoint, you need to ensure you have at least 2467.27 KiB of SRAM on the SoC to run this model. The Ethos-U compiler provides a scheduling algorithm allowing to
lower the peak SRAM usage within reasonable limits, you need to add the `--optimise Size` or `--arena-cache-size` CLI options for to the compile spec. You can read more about the options of the
Ethos-U compiler in the documentation [here](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-vela/-/blob/main/OPTIONS.md#optimise). If the peak SRAM usage remains too high in
Shared Sram memory mode, you would need to us the Dedicated Sram mode in order to store the Neural Network and the Ethos-U scratch buffer in the external memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all-caps SRAM?

The main advantage of the Dedicated_Sram memory mode is that you can run large models and still benefit from the low-latency/high-bandwidth of the SRAM, used as a cache.
It is important to highlight that when you specify a memory mode in the compile spec, in the runtime, the user is expected to place the scratch buffer and NN in the correct memory location.
In other words, when you specify for ex. Shared Sram memory mode, the runtime application logic should place the ethos-U scratch buffer in the on-chip memory and the NN in the external memory for optimal performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SRAM, Ethos-U

Core-platform includes a [tutorial](https://gitlab.arm.com/artificial-intelligence/ethos-u/ethos-u-core-platform/-/blob/main/PORTING.md?ref_type=heads) on porting a new target. If you port your target to
core-platform, you can then easily reuse it in the ExecuTorch runtime.

Also, as explained in the comments in `backends/arm/scripts/corstone_utils.cmake`, note that REGIONCFG register of the Ethos-U controls the memory(on-chip or external memory) used by the NPU to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space between memory and (

## Corstone linker scripts

The linker scripts point to the linker where to place various objects in memory when the application is loaded onto the target.
In the `arm_executor_runner.cpp` application, we reuse the linker scripts from the core-platform project. Note that the Global Offset Table(.got symbols) needs to be 16-byte aligned. The linker scripts are highly specific to the memory map of system.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing space between Table and (
maybe s/the memory map of system/the memory map of the system/ ?


For example, if you generate a pte file with compile specification for Shared Sram, the scratch buffer should be placed in the SRAM and the NN in the external memory in the runtime application code. You can see we are following
this approach in the `examples/arm/executor_runner/arm_executor_runner.cpp` example application. In the linker scripts for the application(`examples/arm/executor_runner/Corstone-320.ld` and
`examples/arm/executor_runner/Corstone-300.ld`) we check the value of `ETHOSU_ARENA` to determine whether the ethos-u scratch buffer is placed in the on-chip memory or in the external memory. In this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize Ethos-U?

and the `.bss.tensor_arena` section is placed in the correct location in the memory map thanks to
the `ETHOSU_ARENA` parameter.

There is a tight coupling between the memory mode for the Ethos-U and the placement of the ethos-u scratch buffer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

capitalize ethos-u?

Comment on lines +111 to +112
from executorch.exir.passes.quantize_io_pass import QuantizeInputs
from executorch.exir.passes.quantize_io_pass import QuantizeOutputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just from executorch.exir.passes.quantize_io_pass import QuantizeInputs, QuantizeOutputs?

StrycekSimon pushed a commit to nxp-upstream/executorch that referenced this pull request Sep 23, 2025
…ide (pytorch#14144)

Memory modes: The Shared_Sram, Sram_Only and Dedicated_Sram memory modes
are specified in the compile spec and are tightly coupled with how the
ethos-U scratch buffer and NN should be placed in the embedded
application. Different memory modes profoundly impact the performance
and memory footprint of the application and it is important to use the
NPU in the most suitable memory mode for optimal performance.

Porting guide: A document explaining the key steps to port a new
hardware target with an Ethos-U NPU to the Ethos-U backend in ExecuTorch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. partner: arm For backend delegation, kernels, demo, etc. from the 3rd-party partner, Arm release notes: arm Changes to the ARM backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants