Tech review of Training and Inference with PyTorch

annietllnd · annietllnd · commit ee1c784a9b1c · 2025-09-24T17:06:42.000+02:00
diff --git a/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/2-env-setup.md b/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/2-env-setup.md
@@ -44,6 +44,7 @@ From within the Python virtual environment, run the commands below to download t
 cd $HOME
 git clone https://github.com/pytorch/executorch.git
 cd executorch
+git checkout 188312844ebfb499f92ab5a02137ed1a4abca782
 ```
 
 Run the commands below to set up the ExecuTorch internal dependencies:
@@ -70,7 +71,7 @@ pip list | grep executorch
 ```
 
 ```output
-executorch         0.6.0a0+3eea1f1
+executorch             1.1.0a0+1883128
 ```
 
 ## Next Steps
diff --git a/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/3-env-setup-fvp.md b/content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/3-env-setup-fvp.md
@@ -16,11 +16,11 @@ The Corstone reference system is provided free of charge, although you will have
 
 ## Corstone-320 FVP Setup for ExecuTorch
 
-Navigate to the Arm examples directory in the ExecuTorch repository. Run the following command.
+Run the FVP setup script in the ExecuTorch repository.
 
 ```bash
-cd $HOME/executorch/examples/arm
-./setup.sh --i-agree-to-the-contained-eula
+cd $HOME/executorch
+./examples/arm/setup.sh --i-agree-to-the-contained-eula
 ```
 
 After the script has finished running, it prints a command to run to finalize the installation. This step adds the FVP executables to your system path.
diff --git a/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/_index.md b/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/_index.md
@@ -1,9 +1,9 @@
 ---
 title: Edge AI with PyTorch & ExecuTorch - Tiny Rock-Paper-Scissors on Arm
 
-minutes_to_complete: 90
+minutes_to_complete: 60
 
-who_is_this_for: This learning path is for machine learning engineers, embedded AI developers, and researchers interested in deploying TinyML models on Arm-based edge devices. You will learn how to train and deploy a machine learning model for the classic game "Rock-Paper-Scissors" on edge devices. We'll use PyTorch and ExecuTorch, a framework designed for efficient on-device inference, to build and run a small-scale computer vision model.
+who_is_this_for: This learning path is for machine learning developers interested in deploying TinyML models on Arm-based edge devices. You will learn how to train and deploy a machine learning model for the classic game "Rock-Paper-Scissors" on edge devices. You'll use PyTorch and ExecuTorch, frameworks designed for efficient on-device inference, to build and run a small-scale computer vision model.
 
 
 learning_objectives:
@@ -16,30 +16,28 @@ learning_objectives:
 prerequisites:
    - A basic understanding of machine learning concepts.
    - Familiarity with Python and the PyTorch library.
-   - It is advised to first complete [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) before starting this learning path.
-   - A Linux host machine or VM running Ubuntu 22.04 or higher.
-   - An Arm license to run the examples on the Corstone-320 Fixed Virtual Platform (FVP), for hands-on deployment.
-
+   - Having completed [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm).
+   - An x86 Linux host machine or VM running Ubuntu 22.04 or higher.
 
 author: Dominica Abena O. Amanfo
 
 ### Tags
-skilllevels: Intermediate
+skilllevels: Introductory
 subjects: ML
 armips:
     - Cortex-M
+    - Ethos-U
 tools_software_languages:
     - tinyML
     - Computer Vision
-    - Edge AI Game
+    - Edge AI
     - CNN
     - PyTorch
     - ExecuTorch
 
 operatingsystems:
     - Linux
 
-
 further_reading:
     - resource:
         title: Run Llama 3 on a Raspberry Pi 5 using ExecuTorch
diff --git a/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/env-setup-1.md b/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/env-setup-1.md
@@ -7,31 +7,28 @@ layout: learningpathall
 ---
 
 ## Overview
-This learning path (LP) is a direct follow-up to the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) learning path. While the previous path introduced you to the core concepts and the toolchain, this one puts that knowledge into practice with a fun, real-world example. We will move from the simple ["Feedforward Neural Network"](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/4-build-model) in the previous LP, to a more practical computer vision task: A tiny Rock-Paper-Scissors game, to demonstrate how these tools can be used to solve a tangible problem and run efficiently on Arm-based edge devices.
+This learning path (LP) is a direct follow-up to the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) learning path. While the previous one introduced you to the core concepts and the toolchain, this one puts that knowledge into practice with a fun, real-world example. You will move from the simple [Feedforward Neural Network](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/4-build-model) in the previous LP, to a more practical computer vision task: A tiny Rock-Paper-Scissors game, to demonstrate how these tools can be used to solve a tangible problem and run efficiently on Arm-based edge devices.
 
-
-We will train a lightweight CNN to classify images of the letters R, P, and S as "rock," "paper," or "scissors." The script uses a synthetic data renderer to create a large dataset of these images with various transformations and noise, eliminating the need for a massive real-world dataset.
+You will train a lightweight CNN to classify images of the letters R, P, and S as "rock," "paper," or "scissors." The script uses a synthetic data renderer to create a large dataset of these images with various transformations and noise, eliminating the need for a massive real-world dataset.
 
 ### What is a Convolutional Neural Network (CNN)?
 A Convolutional Neural Network (CNN) is a type of deep neural network primarily used for analyzing visual imagery. Unlike traditional neural networks, CNNs are designed to process pixel data by using a mathematical operation called **convolution**. This allows them to automatically and adaptively learn spatial hierarchies of features from input images, from low-level features like edges and textures to high-level features like shapes and objects.
 
 ![Image of a convolutional neural network architecture](image.png)
-
-Image of a convolutional neural network architecture : [Image credits](https://medium.com/@atul_86537/learning-ml-from-first-principles-c-linux-the-rick-and-morty-way-convolutional-neural-c76c3df511f4).
+[Image credits](https://medium.com/@atul_86537/learning-ml-from-first-principles-c-linux-the-rick-and-morty-way-convolutional-neural-c76c3df511f4).
 
 CNNs are the backbone of many modern computer vision applications, including:
 
 - **Image Classification:** Identifying the main object in an image, like classifying a photo as a "cat" or "dog".
 - **Object Detection:** Locating specific objects within an image and drawing a box around them.
 - **Facial Recognition:** Identifying and verifying individuals based on their faces.
 
-For our Rock-Paper-Scissors game, we'll use a tiny CNN to classify images of the letters R, P, and S as the corresponding hand gestures.
+For the Rock-Paper-Scissors game, you'll use a tiny CNN to classify images of the letters R, P, and S as the corresponding hand gestures.
 
 
 
 ## Environment Setup
-To get started, follow the first three chapters of the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) Learning Path. This will set up your development environment and install the necessary tools.
-
+To get started, follow the first three chapters of the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) Learning Path. This will set up your development environment and install the necessary tools. Return to this LP once you've run the `./examples/arm/run.sh` script in the ExecuTorch repository.
 
 If you just followed the LP above, you should already have your virtual environment activated. If not, activate it using:
 
@@ -43,7 +40,7 @@ The prompt of your terminal now has `(executorch-venv)` as a prefix to indicate
 Run the commands below to install the dependencies.
 
 ```bash
-pip install argparse json numpy pillow torch
+pip install argparse numpy pillow torch
 ```
-You are now ready to build the model.
+You are now ready to create the model.
 
diff --git a/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/fine-tune-2.md b/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/fine-tune-2.md
@@ -14,7 +14,7 @@ Navigate to the Arm examples directory in the ExecuTorch repository.
 cd $HOME/executorch/examples/arm
 ```
 
-Using a file editor of your choice, create a file named rps_tiny.py, copy and paste the code shown below:
+Using a file editor of your choice, create a file named `rps_tiny.py`, copy and paste the code shown below:
 
 ```python
 #!/usr/bin/env python3
@@ -252,7 +252,7 @@ def ascii_show(img: torch.Tensor) -> str:
         row=[]
         for x in range(0,w,1):
             v = arr[y, x]
-            row.append(chars[min(len(chars)-1, v*len(chars)//256)])
+            row.append(chars[min(len(chars)-1, int(v)*len(chars)//256)])
         lines.append("".join(row))
     return "\n".join(lines)
 
@@ -369,16 +369,15 @@ if __name__ == "__main__":
 ```
 
 
-### How This Script Works:
+### About the Script
 The script handles the entire workflow: data generation, model training, and a simple command-line game.
 
-- **Synthetic Data Generation:** The script includes a function render_rps() that generates 28x28 grayscale images of the letters 'R', 'P', and 'S' with random rotations, blurs, and noise. This creates a diverse dataset that's used to train the model.
+- **Synthetic Data Generation:** The script includes a function `render_rps()` that generates 28x28 grayscale images of the letters 'R', 'P', and 'S' with random rotations, blurs, and noise. This creates a diverse dataset that's used to train the model.
 - **Model Architecture:** The model, a TinyRPS class, is a simple Convolutional Neural Network (CNN). It uses a series of 2D convolutional layers, followed by pooling layers to reduce spatial dimensions, and finally, fully connected linear layers to produce a final prediction. This architecture is efficient and well-suited for edge devices.
-- **Training:** The script generates synthetic training and validation datasets. It then trains the CNN model using the **Adam optimizer** and **Cross-Entropy Loss**. It tracks validation accuracy and saves the best-performing model to rps_best.pt.
-- **ExecuTorch Export:** A key part of the script is the export_to_pte() function. This function uses the torch.export module (or a fallback) to trace the trained PyTorch model and convert it into an ExecuTorch program (.pte). This compiled program is highly optimized for deployment on any target hardware. For self-practice, you can play around with Cortex-A or M devices.
+- **Training:** The script generates synthetic training and validation datasets. It then trains the CNN model using the **Adam optimizer** and **Cross-Entropy Loss**. It tracks validation accuracy and saves the best-performing model to `rps_best.pt`.
+- **ExecuTorch Export:** A key part of the script is the `export_to_pte()` function. This function uses the `torch.export module` (or a fallback) to trace the trained PyTorch model and convert it into an ExecuTorch program (`.pte`). This compiled program is highly optimized for deployment on any target hardware, for example Cortex-M or Cortex-A CPUs for embedded devices.
 - **CLI Mini-Game**: After training, you can play an interactive game. The script generates an image of your move and a random opponent's move. It then uses the trained model to classify both images and determines the winner based on the model's predictions.
 
-
 ### Running the Script:
 
 To train the model, export it, and play the game, run the following command:
@@ -389,7 +388,7 @@ python rps_tiny.py --epochs 8 --export --play
 
 You'll see the training progress, where the model's accuracy rapidly improves on the synthetic data.
 
-```bash
+```output
 == Building synthetic datasets ==
 Train size: 3000  |  Val size: 600
   totl += float(loss)*x.size(0)
@@ -405,7 +404,7 @@ Loaded weights from rps_best.pt
 ```
 After training and export, the game will start. Type rock, paper, or scissors and see the model's predictions and what your opponent played.
 
-```bash
+```output
 === Rock–Paper–Scissors: Play vs Tiny CNN ===
 Type one of: rock / paper / scissors / quit
 
@@ -487,4 +486,5 @@ Model thinks opponent played: rock (100.0%)
 --------------------------------------------------
 Your move>
 ```
-Type quit to exit the game. You can now prepare the model to run on the FVP in the next chapter.
+
+Type `quit` to exit the game. In the next chapter, you'll prepare the model to run on the FVP.
diff --git a/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/fvp-3.md b/content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/fvp-3.md
@@ -19,8 +19,7 @@ export ET_HOME=$HOME/executorch
 export executorch_DIR=$ET_HOME/build
 ```
 
-
-Use the AOT Arm compiler to generate the optimized .pte file. This command delegates the model to the Ethos-U85 NPU, applies quantization to reduce model size and improve performance, and specifies the memory configuration. Run it from the ExecuTorch root directory.
+Use the AOT Arm compiler to generate the optimized `.pte` file. This command delegates the model to the Ethos-U85 NPU, applies quantization to reduce model size and improve performance, and specifies the memory configuration. Run it from the ExecuTorch root directory.
 
 ```bash
 cd $ET_HOME
@@ -35,12 +34,11 @@ You should see:
 PTE file saved as rps_tiny_arm_delegate_ethos-u85-128.pte
 ```
 
-Next, you'll build the Ethos-U runner, which is a bare-metal executable that includes the ExecuTorch runtime and your compiled model. This runner is what the FVP will execute. Navigate to the runner's directory and use CMake to configure the build.
+Next, you'll build the **Ethos-U runner**, which is a bare-metal executable that includes the ExecuTorch runtime and your compiled model. This runner is what the FVP will execute. Navigate to the runner's directory and use CMake to configure the build.
 
 ```bash
 cd $HOME/executorch/examples/arm/executor_runner
 
-
 cmake -DCMAKE_BUILD_TYPE=Release \
       -S "$ET_HOME/examples/arm/executor_runner" \
       -B "$ET_HOME/examples/arm/executor_runner/cmake-out" \
@@ -51,7 +49,7 @@ cmake -DCMAKE_BUILD_TYPE=Release \
       -DET_PTE_FILE_PATH="$ET_HOME/rps_tiny_arm_delegate_ethos-u85-128.pte" \
       -DETHOS_SDK_PATH="$ET_HOME/examples/arm/ethos-u-scratch/ethos-u" \
       -DETHOSU_TARGET_NPU_CONFIG=ethos-u85-128 \
-      -DSYSTEM_CONFIG=Ethos_U85_SYS_DRAM_Mid \
+      -DSYSTEM_CONFIG=Ethos_U85_SYS_DRAM_Mid
 ```
 
 You should see output similar to this, indicating a successful configuration:
@@ -76,11 +74,11 @@ cmake --build "$ET_HOME/examples/arm/executor_runner/cmake-out" -j --target arm_
 ```
 
 ### Run the Model on the FVP
-With the arm_executor_runner executable ready, you can now run it on the Corstone-320 FVP to see the model on a simulated Arm device.
+With the `arm_executor_runner` executable ready, you can now run it on the Corstone-320 FVP to see the model on a simulated Arm device.
 
 ```bash
 FVP_Corstone_SSE-320 \
--C mps4_board.subsystem.ethosu.num_macs=256 \
+-C mps4_board.subsystem.ethosu.num_macs=128 \
 -C mps4_board.visualisation.disable-visualisation=1 \
 -C vis_hdlcd.disable_visualisation=1                \
 -C mps4_board.telnetterminal0.start_telnet=0        \
@@ -90,9 +88,7 @@ FVP_Corstone_SSE-320 \
 ```
 
 {{% notice Note %}}
-
 The argument `mps4_board.visualisation.disable-visualisation=1` disables the FVP GUI. This can speed up launch time for the FVP.
-
 {{% /notice %}}
 
 
@@ -112,7 +108,10 @@ I [executorch:arm_executor_runner.cpp:563 main()] Setting up planned buffer 0, s
 I [executorch:EthosUBackend.cpp:116 init()] data:0x70000070
 ```
 
+{{% notice Note %}}
+The inference itself may take a longer to run with a model this size - note that this is not a reflection of actual execution time.
+{{%% /notice }}
 
-Congratulations! You've successfully built, optimized, and deployed a computer vision model on a simulated Arm-based system. This hands-on exercise demonstrates the power and practicality of TinyML and ExecuTorch for resource-constrained devices.
+You've now successfully built, optimized, and deployed a computer vision model on a simulated Arm-based system. This hands-on exercise demonstrates the power and practicality of TinyML and ExecuTorch for resource-constrained devices.
 
 In a future learning path, you can explore comparing different model performances and inference times before and after optimization. You could also analyze CPU and memory usage during inference, providing a deeper understanding of how the ExecuTorch framework optimizes your model for edge deployment.