Skip to content

Commit 36883f7

Browse files
Reordered files. Continued editing.
1 parent 2986615 commit 36883f7

File tree

7 files changed

+61
-32
lines changed

7 files changed

+61
-32
lines changed

content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/2-overview.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,23 @@ weight: 2
55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
8-
## Visualize ML on embedded devices
8+
## Simulate and evaluate TinyML performance on Arm virtual hardware
99

1010
In this section, you’ll learn how TinyML, ExecuTorch, and Arm Fixed Virtual Platforms work together to simulate embedded AI workloads before hardware is available.
1111

1212
Choosing the right hardware for your machine learning (ML) model starts with having the right tools. In many cases, you need to test and iterate before your target hardware is even available, especially when working with cutting-edge accelerators like the Ethos-U NPU.
1313

1414
Arm [Fixed Virtual Platforms](https://developer.arm.com/Tools%20and%20Software/Fixed%20Virtual%20Platforms) (FVPs) let you visualize and test model performance before any physical hardware is available.
1515

16+
By simulating hardware behavior at the system level, FVPs allow you to:
17+
18+
- Benchmark inference speed and measure operator-level performance
19+
- Identify which operations are delegated to the NPU and which execute on the CPU
20+
- Validate end-to-end integration between components like ExecuTorch and Arm NN
21+
- Iterate faster by debugging and optimizing your workload without relying on hardware
22+
23+
This makes FVPs a crucial tool for embedded ML workflows where precision, portability, and early validation matter.
24+
1625
## What is TinyML?
1726

1827
TinyML is machine learning optimized to run on low-power, resource-constrained devices such as Arm Cortex-M microcontrollers and NPUs like the Ethos-U. These models must fit within tight memory and compute budgets, making them ideal for embedded systems.
@@ -31,7 +40,7 @@ ExecuTorch provides:
3140
- Delegation of selected operators to accelerators like Ethos-U
3241
- Tight integration with Arm compute libraries
3342

34-
## Why should I use Arm Fixed Virtual Platforms?
43+
## Why use Arm Fixed Virtual Platforms?
3544

3645
Arm Fixed Virtual Platforms (FVPs) are virtual hardware models used to simulate Arm-based systems like the Corstone-320. They allow developers to validate and tune software before silicon is available, which is especially important when targeting newly-released accelerators like the [Ethos-U85](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u85) NPU.
3746

@@ -46,5 +55,7 @@ These virtual platforms also include a built-in graphical user interface (GUI) t
4655

4756
The Corstone-320 FVP is a virtual model of an Arm-based microcontroller system optimized for AI and TinyML workloads. It supports Cortex-M CPUs and the Ethos-U NPU, making it ideal for early testing, performance tuning, and validation of embedded AI applications, all before physical hardware is available.
4857

49-
The Corstone-320 reference system is free to use, but you'll need to accept the license agreement during installation.
50-
For more information, see the [Corstone-320 documentation](https://developer.arm.com/documentation/109761/0000?lang=en).
58+
The Corstone-320 reference system is free to use, but you'll need to accept the license agreement during installation. For more information, see the [Corstone-320 documentation](https://developer.arm.com/documentation/109761/0000?lang=en).
59+
60+
## What's next?
61+
In the next section, you'll explore how ExecuTorch compiles and deploys models to run efficiently on simulated hardware.

content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/3-executorch-workflow.md

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,11 @@ weight: 3
77
# Do not modify these elements
88
layout: "learningpathall"
99
---
10-
## How the ExecuTorch workflow operates
10+
## Overview
1111

12-
Before setting up your environment, it helps to understand how ExecuTorch processes a model and runs it on Arm-based hardware.
12+
Before setting up your environment, it helps to understand how ExecuTorch processes a model and runs it on Arm-based hardware. ExecuTorch uses ahead-of-time (AOT) compilation to transform PyTorch models into optimized operator graphs that run efficiently on resource-constrained systems. The workflow supports hybrid execution across CPU and NPU cores, allowing you to profile, debug, and deploy TinyML workloads with low runtime overhead and high portability across Arm microcontrollers.
13+
14+
## ExecuTorch in three steps
1315

1416
ExecuTorch works in three main steps:
1517

@@ -29,7 +31,18 @@ ExecuTorch works in three main steps:
2931
- Execute the compiled model on an FVP or physical target
3032
- The Ethos-U NPU runs delegated operators - all others run on the Cortex-M CPU
3133

32-
## Visual overview
34+
For more detail, see the [ExecuTorch documentation](https://docs.pytorch.org/executorch/stable/intro-how-it-works.html).
35+
36+
37+
## A visual overview
38+
39+
The diagram below summarizes the ExecuTorch workflow from model export to deployment. It shows how a trained PyTorch model is transformed into an optimized, quantized format and deployed to a target system such as an Arm Fixed Virtual Platform (FVP).
40+
41+
- On the left, the model is exported into a graph of operators, with eligible layers flagged for NPU acceleration.
42+
- In the center, the AOT compiler optimizes and delegates operations, producing a `.pte` file ready for deployment.
43+
- On the right, the model is executed on embedded Arm hardware, where delegated operators run on the Ethos-U NPU, and the rest are handled by the Cortex-M CPU.
44+
45+
This three-step workflow ensures your TinyML models are performance-tuned and hardware-aware before deployment—even without access to physical silicon.
3346

3447
![Diagram showing the three-step ExecuTorch workflow from model export to deployment#center](./how-executorch-works-high-level.png "The three-step ExecuTorch workflow from model export to deployment")
3548

content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/4-env-setup-execut.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ These instructions have been tested on:
2222
- Ubuntu 22.04 and 24.04
2323
- Windows Subsystem for Linux (WSL)
2424

25-
## Install the required system packages:
25+
Run the following commands to install the dependencies:
2626

2727
```bash
2828
sudo apt update
@@ -79,6 +79,6 @@ Expected output:
7979
executorch 0.8.0a0+92fb0cc
8080
```
8181

82-
## Next steps
82+
## What's next?
8383

8484
Now that ExecuTorch is installed, you're ready to simulate your TinyML model on an Arm Fixed Virtual Platform (FVP). In the next section, you'll configure and launch a Fixed Virtual Platform.

content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/5-env-setup-fvp.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,10 @@ In this section, you’ll install and configure the Corstone-320 FVP to simulate
1616

1717
Before you begin, make sure you’ve completed the steps in the previous section to install ExecuTorch.
1818

19-
{{< notice note >}}
20-
On macOS, you'll need to perform additional setup to support FVP execution.
21-
See the [FVPs-on-Mac GitHub repo](https://github.com/Arm-Examples/FVPs-on-Mac/) for instructions before continuing.
19+
{{< notice Note >}}
20+
If you're using macOS, you need to perform additional setup to support FVP execution.
21+
22+
See the <a href="https://github.com/Arm-Examples/FVPs-on-Mac/" target="_blank">FVPs-on-Mac GitHub repo</a> for instructions before continuing.
2223
{{< /notice >}}
2324

2425
Run the setup script provided in the ExecuTorch examples directory:
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# User change
33
title: "Deploy and run Mobilenet V2 on the Corstone-320 FVP"
44

5-
weight: 7 # 1 is first, 2 is second, etc.
5+
weight: 6 # 1 is first, 2 is second, etc.
66

77
# Do not modify these elements
88
layout: "learningpathall"
Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,15 @@
22
# User change
33
title: "Enable GUI and deploy a model on Corstone-320 FVP"
44

5-
weight: 6 # 1 is first, 2 is second, etc.
5+
weight: 7 # 1 is first, 2 is second, etc.
66

77
# Do not modify these elements
88
layout: "learningpathall"
99
---
1010

1111
## Visualize model execution using the FVP GUI
1212

13-
You’ll now enable the graphical interface for the Corstone-320 FVP and run a real TinyML model to observe instruction counts and performance output in a windowed display.
13+
You’ve successfully deployed a model on the Corstone-320 FVP from the command line. In this step, you’ll enable the platform’s built-in graphical output and re-run the model to observe instruction-level execution metrics in a windowed display.
1414

1515
## Find your IP address
1616

@@ -30,7 +30,7 @@ ipconfig getifaddr en0 # Returns your Mac's WiFi IP address
3030

3131
{{% /notice %}}
3232

33-
## Enable the FVP's GUI
33+
## Configure the FVP for GUI output
3434

3535
Edit the following parameters in your locally checked out [executorch/backends/arm/scripts/run_fvp.sh](https://github.com/pytorch/executorch/blob/d5fe5faadb8a46375d925b18827493cd65ec84ce/backends/arm/scripts/run_fvp.sh#L97-L102) file, to enable the Mobilenet V2 output on the FVP's GUI:
3636

@@ -59,9 +59,24 @@ Edit the following parameters in your locally checked out [executorch/backends/a
5959

6060
## Deploy the model
6161

62-
{{% notice macOS %}}
62+
Now run the Mobilenet V2 computer vision model, using [executorch/examples/arm/run.sh](https://github.com/pytorch/executorch/blob/main/examples/arm/run.sh):
63+
```bash
64+
./examples/arm/run.sh \
65+
--aot_arm_compiler_flags="--delegate --quantize --intermediates mv2_u85/ --debug --evaluate" \
66+
--output=mv2_u85 \
67+
--target=ethos-u85-128 \
68+
--model_name=mv2
69+
```
70+
71+
Observe that the FVP loads the model file, compiles the PyTorch model to ExecuTorch `.pte` format and then shows an instruction count in the top right of the GUI:
72+
73+
![Terminal and FVP output#center](./Terminal%20and%20FVP%20Output.jpg "Terminal and FVP output")
74+
75+
{{% notice Note %}}
6376

64-
- **Start Docker:** on macOS, FVPs run inside a Docker container.
77+
For macOS users, follow these instructions:
78+
79+
- Start Docker. FVPs run inside a Docker container.
6580
- Make sure to use an [official version of Docker](https://www.docker.com/products/docker-desktop/) and not a free version like the [Colima](https://github.com/abiosoft/colima?tab=readme-ov-file) Docker container runtime
6681
- `run.sh` assumes Docker Desktop style networking (`host.docker.internal`) which breaks with Colima
6782
- Colima then breaks the FVP GUI
@@ -74,16 +89,3 @@ Edit the following parameters in your locally checked out [executorch/backends/a
7489
xhost + 127.0.0.1 # The Docker container seems to proxy through localhost
7590
```
7691
{{% /notice %}}
77-
78-
Now run the Mobilenet V2 computer vision model, using [executorch/examples/arm/run.sh](https://github.com/pytorch/executorch/blob/main/examples/arm/run.sh):
79-
```bash
80-
./examples/arm/run.sh \
81-
--aot_arm_compiler_flags="--delegate --quantize --intermediates mv2_u85/ --debug --evaluate" \
82-
--output=mv2_u85 \
83-
--target=ethos-u85-128 \
84-
--model_name=mv2
85-
```
86-
87-
Observe that the FVP loads the model file, compiles the PyTorch model to ExecuTorch `.pte` format and then shows an instruction count in the top right of the GUI:
88-
89-
![Terminal and FVP output](./Terminal%20and%20FVP%20Output.jpg)

content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/8-evaluate-output.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,4 +164,6 @@ I [executorch:arm_perf_monitor.cpp:184] ethosu_pmu_cntr4 : 130
164164
|ethosu_pmu_cntr3|External DRAM write beats(ETHOSU_PMU_EXT_WR_DATA_BEAT_WRITTEN)|Number of write data beats to external memory.|Helps detect offloading or insufficient SRAM.|
165165
|ethosu_pmu_cntr4|Idle cycles(ETHOSU_PMU_NPU_IDLE)|Number of cycles where the NPU had no work scheduled (i.e., idle).|High idle count = possible pipeline stalls or bad scheduling.|
166166

167-
In this Learning Path, you have successfully learned how to deploy a MobileNet V2 model using ExecuTorch on Arm's Corstone-320 FVP. You're now ready to apply what you've learned to other models and configurations using ExecuTorch.
167+
## Summary
168+
169+
In this Learning Path, you have learned how to deploy a MobileNet V2 model using ExecuTorch on Arm's Corstone-320 FVP. You're now ready to apply what you've learned to other models and configurations using ExecuTorch.

0 commit comments

Comments
 (0)