Reordered files. Continued editing.

madeline-underwood · madeline-underwood · commit 36883f706775 · 2025-08-01T17:37:38.000Z
diff --git a/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/2-overview.md b/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/2-overview.md
@@ -5,14 +5,23 @@ weight: 2
 ### FIXED, DO NOT MODIFY
 layout: learningpathall
 ---
-## Visualize ML on embedded devices
+## Simulate and evaluate TinyML performance on Arm virtual hardware
 
 In this section, you’ll learn how TinyML, ExecuTorch, and Arm Fixed Virtual Platforms work together to simulate embedded AI workloads before hardware is available.
 
 Choosing the right hardware for your machine learning (ML) model starts with having the right tools. In many cases, you need to test and iterate before your target hardware is even available, especially when working with cutting-edge accelerators like the Ethos-U NPU.
 
 Arm [Fixed Virtual Platforms](https://developer.arm.com/Tools%20and%20Software/Fixed%20Virtual%20Platforms) (FVPs) let you visualize and test model performance before any physical hardware is available.
 
+ By simulating hardware behavior at the system level, FVPs allow you to:
+
+- Benchmark inference speed and measure operator-level performance
+- Identify which operations are delegated to the NPU and which execute on the CPU
+- Validate end-to-end integration between components like ExecuTorch and Arm NN
+- Iterate faster by debugging and optimizing your workload without relying on hardware
+
+This makes FVPs a crucial tool for embedded ML workflows where precision, portability, and early validation matter.
+
 ## What is TinyML?
 
 TinyML is machine learning optimized to run on low-power, resource-constrained devices such as Arm Cortex-M microcontrollers and NPUs like the Ethos-U. These models must fit within tight memory and compute budgets, making them ideal for embedded systems.
@@ -31,7 +40,7 @@ ExecuTorch provides:
 - Delegation of selected operators to accelerators like Ethos-U
 - Tight integration with Arm compute libraries
 
-## Why should I use Arm Fixed Virtual Platforms?
+## Why use Arm Fixed Virtual Platforms?
 
 Arm Fixed Virtual Platforms (FVPs) are virtual hardware models used to simulate Arm-based systems like the Corstone-320. They allow developers to validate and tune software before silicon is available, which is especially important when targeting newly-released accelerators like the [Ethos-U85](https://www.arm.com/products/silicon-ip-cpu/ethos/ethos-u85) NPU.
 
@@ -46,5 +55,7 @@ These virtual platforms also include a built-in graphical user interface (GUI) t
 
 The Corstone-320 FVP is a virtual model of an Arm-based microcontroller system optimized for AI and TinyML workloads. It supports Cortex-M CPUs and the Ethos-U NPU, making it ideal for early testing, performance tuning, and validation of embedded AI applications, all before physical hardware is available.
 
-The Corstone-320 reference system is free to use, but you'll need to accept the license agreement during installation.  
-For more information, see the [Corstone-320 documentation](https://developer.arm.com/documentation/109761/0000?lang=en).
+The Corstone-320 reference system is free to use, but you'll need to accept the license agreement during installation. For more information, see the [Corstone-320 documentation](https://developer.arm.com/documentation/109761/0000?lang=en).
+
+## What's next?
+In the next section, you'll explore how ExecuTorch compiles and deploys models to run efficiently on simulated hardware.
diff --git a/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/3-executorch-workflow.md b/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/3-executorch-workflow.md
@@ -7,9 +7,11 @@ weight: 3
 # Do not modify these elements
 layout: "learningpathall"
 ---
-##  How the ExecuTorch workflow operates
+##  Overview
 
-Before setting up your environment, it helps to understand how ExecuTorch processes a model and runs it on Arm-based hardware.
+Before setting up your environment, it helps to understand how ExecuTorch processes a model and runs it on Arm-based hardware. ExecuTorch uses ahead-of-time (AOT) compilation to transform PyTorch models into optimized operator graphs that run efficiently on resource-constrained systems. The workflow supports hybrid execution across CPU and NPU cores, allowing you to profile, debug, and deploy TinyML workloads with low runtime overhead and high portability across Arm microcontrollers.
+
+## ExecuTorch in three steps
 
 ExecuTorch works in three main steps:
 
@@ -29,7 +31,18 @@ ExecuTorch works in three main steps:
    - Execute the compiled model on an FVP or physical target
    - The Ethos-U NPU runs delegated operators - all others run on the Cortex-M CPU
 
-## Visual overview
+For more detail, see the [ExecuTorch documentation](https://docs.pytorch.org/executorch/stable/intro-how-it-works.html).
+
+
+## A visual overview
+
+The diagram below summarizes the ExecuTorch workflow from model export to deployment. It shows how a trained PyTorch model is transformed into an optimized, quantized format and deployed to a target system such as an Arm Fixed Virtual Platform (FVP).
+
+- On the left, the model is exported into a graph of operators, with eligible layers flagged for NPU acceleration.
+- In the center, the AOT compiler optimizes and delegates operations, producing a `.pte` file ready for deployment.
+- On the right, the model is executed on embedded Arm hardware, where delegated operators run on the Ethos-U NPU, and the rest are handled by the Cortex-M CPU.
+
+This three-step workflow ensures your TinyML models are performance-tuned and hardware-aware before deployment—even without access to physical silicon.
 
 ![Diagram showing the three-step ExecuTorch workflow from model export to deployment#center](./how-executorch-works-high-level.png "The three-step ExecuTorch workflow from model export to deployment")
 
diff --git a/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/4-env-setup-execut.md b/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/4-env-setup-execut.md
@@ -22,7 +22,7 @@ These instructions have been tested on:
 - Ubuntu 22.04 and 24.04
 - Windows Subsystem for Linux (WSL)
 
-## Install the required system packages:
+Run the following commands to install the dependencies:
 
 ```bash
 sudo apt update
@@ -79,6 +79,6 @@ Expected output:
 executorch         0.8.0a0+92fb0cc
 ```
 
-## Next steps
+## What's next?
 
 Now that ExecuTorch is installed, you're ready to simulate your TinyML model on an Arm Fixed Virtual Platform (FVP). In the next section, you'll configure and launch a Fixed Virtual Platform.
diff --git a/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/5-env-setup-fvp.md b/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/5-env-setup-fvp.md
@@ -16,9 +16,10 @@ In this section, you’ll install and configure the Corstone-320 FVP to simulate
 
 Before you begin, make sure you’ve completed the steps in the previous section to install ExecuTorch.
 
-{{< notice note >}}
-On macOS, you'll need to perform additional setup to support FVP execution.  
-See the [FVPs-on-Mac GitHub repo](https://github.com/Arm-Examples/FVPs-on-Mac/) for instructions before continuing.
+{{< notice Note >}}
+If you're using macOS, you need to perform additional setup to support FVP execution.  
+
+See the <a href="https://github.com/Arm-Examples/FVPs-on-Mac/" target="_blank">FVPs-on-Mac GitHub repo</a> for instructions before continuing.
 {{< /notice >}}
 
 Run the setup script provided in the ExecuTorch examples directory:
diff --git a/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/6-run-model.md b/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/6-run-model.md
@@ -2,7 +2,7 @@
 # User change
 title: "Deploy and run Mobilenet V2 on the Corstone-320 FVP"
 
-weight: 7 # 1 is first, 2 is second, etc.
+weight: 6 # 1 is first, 2 is second, etc.
 
 # Do not modify these elements
 layout: "learningpathall"
diff --git a/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/7-configure-fvp-gui.md b/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/7-configure-fvp-gui.md
@@ -2,15 +2,15 @@
 # User change
 title: "Enable GUI and deploy a model on Corstone-320 FVP"
 
-weight: 6 # 1 is first, 2 is second, etc.
+weight: 7 # 1 is first, 2 is second, etc.
 
 # Do not modify these elements
 layout: "learningpathall"
 ---
 
 ## Visualize model execution using the FVP GUI
 
-You’ll now enable the graphical interface for the Corstone-320 FVP and run a real TinyML model to observe instruction counts and performance output in a windowed display.
+You’ve successfully deployed a model on the Corstone-320 FVP from the command line. In this step, you’ll enable the platform’s built-in graphical output and re-run the model to observe instruction-level execution metrics in a windowed display.
 
 ## Find your IP address
 
@@ -30,7 +30,7 @@ ipconfig getifaddr en0 # Returns your Mac's WiFi IP address
 
 {{% /notice %}}
 
-## Enable the FVP's GUI
+## Configure the FVP for GUI output
 
 Edit the following parameters in your locally checked out [executorch/backends/arm/scripts/run_fvp.sh](https://github.com/pytorch/executorch/blob/d5fe5faadb8a46375d925b18827493cd65ec84ce/backends/arm/scripts/run_fvp.sh#L97-L102) file, to enable the Mobilenet V2 output on the FVP's GUI:
 
@@ -59,9 +59,24 @@ Edit the following parameters in your locally checked out [executorch/backends/a
 
 ## Deploy the model
 
-{{% notice macOS %}}
+Now run the Mobilenet V2 computer vision model, using [executorch/examples/arm/run.sh](https://github.com/pytorch/executorch/blob/main/examples/arm/run.sh):
+```bash
+./examples/arm/run.sh \
+--aot_arm_compiler_flags="--delegate --quantize --intermediates mv2_u85/ --debug --evaluate" \
+--output=mv2_u85 \
+--target=ethos-u85-128 \
+--model_name=mv2
+```
+
+Observe that the FVP loads the model file, compiles the PyTorch model to ExecuTorch `.pte` format and then shows an instruction count in the top right of the GUI:
+
+![Terminal and FVP output#center](./Terminal%20and%20FVP%20Output.jpg "Terminal and FVP output")
+
+{{% notice Note %}}
 
-- **Start Docker:** on macOS, FVPs run inside a Docker container. 
+For macOS users, follow these instructions:
+
+- Start Docker. FVPs run inside a Docker container. 
 - Make sure to use an [official version of Docker](https://www.docker.com/products/docker-desktop/) and not a free version like the [Colima](https://github.com/abiosoft/colima?tab=readme-ov-file) Docker container runtime
  - `run.sh` assumes Docker Desktop style networking (`host.docker.internal`) which breaks with Colima
   - Colima then breaks the FVP GUI
@@ -74,16 +89,3 @@ Edit the following parameters in your locally checked out [executorch/backends/a
   xhost + 127.0.0.1 # The Docker container seems to proxy through localhost
   ```
 {{% /notice %}}
-
-Now run the Mobilenet V2 computer vision model, using [executorch/examples/arm/run.sh](https://github.com/pytorch/executorch/blob/main/examples/arm/run.sh):
-```bash
-./examples/arm/run.sh \
---aot_arm_compiler_flags="--delegate --quantize --intermediates mv2_u85/ --debug --evaluate" \
---output=mv2_u85 \
---target=ethos-u85-128 \
---model_name=mv2
-```
-
-Observe that the FVP loads the model file, compiles the PyTorch model to ExecuTorch `.pte` format and then shows an instruction count in the top right of the GUI:
-
-![Terminal and FVP output](./Terminal%20and%20FVP%20Output.jpg)
diff --git a/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/8-evaluate-output.md b/content/learning-paths/embedded-and-microcontrollers/visualizing-ethos-u-performance/8-evaluate-output.md
@@ -164,4 +164,6 @@ I [executorch:arm_perf_monitor.cpp:184] ethosu_pmu_cntr4 : 130
 |ethosu_pmu_cntr3|External DRAM write beats(ETHOSU_PMU_EXT_WR_DATA_BEAT_WRITTEN)|Number of write data beats to external memory.|Helps detect offloading or insufficient SRAM.|
 |ethosu_pmu_cntr4|Idle cycles(ETHOSU_PMU_NPU_IDLE)|Number of cycles where the NPU had no work scheduled (i.e., idle).|High idle count = possible pipeline stalls or bad scheduling.|
 
-In this Learning Path, you have successfully learned how to deploy a MobileNet V2 model using ExecuTorch on Arm's Corstone-320 FVP. You're now ready to apply what you've learned to other models and configurations using ExecuTorch.
+## Summary
+
+In this Learning Path, you have learned how to deploy a MobileNet V2 model using ExecuTorch on Arm's Corstone-320 FVP. You're now ready to apply what you've learned to other models and configurations using ExecuTorch.