Skip to content

Commit ee1c784

Browse files
committed
Tech review of Training and Inference with PyTorch
1 parent 30bf5ea commit ee1c784

File tree

6 files changed

+38
-43
lines changed

6 files changed

+38
-43
lines changed

content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/2-env-setup.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ From within the Python virtual environment, run the commands below to download t
4444
cd $HOME
4545
git clone https://github.com/pytorch/executorch.git
4646
cd executorch
47+
git checkout 188312844ebfb499f92ab5a02137ed1a4abca782
4748
```
4849

4950
Run the commands below to set up the ExecuTorch internal dependencies:
@@ -70,7 +71,7 @@ pip list | grep executorch
7071
```
7172

7273
```output
73-
executorch 0.6.0a0+3eea1f1
74+
executorch 1.1.0a0+1883128
7475
```
7576

7677
## Next Steps

content/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/3-env-setup-fvp.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,11 @@ The Corstone reference system is provided free of charge, although you will have
1616

1717
## Corstone-320 FVP Setup for ExecuTorch
1818

19-
Navigate to the Arm examples directory in the ExecuTorch repository. Run the following command.
19+
Run the FVP setup script in the ExecuTorch repository.
2020

2121
```bash
22-
cd $HOME/executorch/examples/arm
23-
./setup.sh --i-agree-to-the-contained-eula
22+
cd $HOME/executorch
23+
./examples/arm/setup.sh --i-agree-to-the-contained-eula
2424
```
2525

2626
After the script has finished running, it prints a command to run to finalize the installation. This step adds the FVP executables to your system path.

content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/_index.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
---
22
title: Edge AI with PyTorch & ExecuTorch - Tiny Rock-Paper-Scissors on Arm
33

4-
minutes_to_complete: 90
4+
minutes_to_complete: 60
55

6-
who_is_this_for: This learning path is for machine learning engineers, embedded AI developers, and researchers interested in deploying TinyML models on Arm-based edge devices. You will learn how to train and deploy a machine learning model for the classic game "Rock-Paper-Scissors" on edge devices. We'll use PyTorch and ExecuTorch, a framework designed for efficient on-device inference, to build and run a small-scale computer vision model.
6+
who_is_this_for: This learning path is for machine learning developers interested in deploying TinyML models on Arm-based edge devices. You will learn how to train and deploy a machine learning model for the classic game "Rock-Paper-Scissors" on edge devices. You'll use PyTorch and ExecuTorch, frameworks designed for efficient on-device inference, to build and run a small-scale computer vision model.
77

88

99
learning_objectives:
@@ -16,30 +16,28 @@ learning_objectives:
1616
prerequisites:
1717
- A basic understanding of machine learning concepts.
1818
- Familiarity with Python and the PyTorch library.
19-
- It is advised to first complete [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) before starting this learning path.
20-
- A Linux host machine or VM running Ubuntu 22.04 or higher.
21-
- An Arm license to run the examples on the Corstone-320 Fixed Virtual Platform (FVP), for hands-on deployment.
22-
19+
- Having completed [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm).
20+
- An x86 Linux host machine or VM running Ubuntu 22.04 or higher.
2321

2422
author: Dominica Abena O. Amanfo
2523

2624
### Tags
27-
skilllevels: Intermediate
25+
skilllevels: Introductory
2826
subjects: ML
2927
armips:
3028
- Cortex-M
29+
- Ethos-U
3130
tools_software_languages:
3231
- tinyML
3332
- Computer Vision
34-
- Edge AI Game
33+
- Edge AI
3534
- CNN
3635
- PyTorch
3736
- ExecuTorch
3837

3938
operatingsystems:
4039
- Linux
4140

42-
4341
further_reading:
4442
- resource:
4543
title: Run Llama 3 on a Raspberry Pi 5 using ExecuTorch

content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/env-setup-1.md

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,31 +7,28 @@ layout: learningpathall
77
---
88

99
## Overview
10-
This learning path (LP) is a direct follow-up to the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) learning path. While the previous path introduced you to the core concepts and the toolchain, this one puts that knowledge into practice with a fun, real-world example. We will move from the simple ["Feedforward Neural Network"](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/4-build-model) in the previous LP, to a more practical computer vision task: A tiny Rock-Paper-Scissors game, to demonstrate how these tools can be used to solve a tangible problem and run efficiently on Arm-based edge devices.
10+
This learning path (LP) is a direct follow-up to the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) learning path. While the previous one introduced you to the core concepts and the toolchain, this one puts that knowledge into practice with a fun, real-world example. You will move from the simple [Feedforward Neural Network](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm/4-build-model) in the previous LP, to a more practical computer vision task: A tiny Rock-Paper-Scissors game, to demonstrate how these tools can be used to solve a tangible problem and run efficiently on Arm-based edge devices.
1111

12-
13-
We will train a lightweight CNN to classify images of the letters R, P, and S as "rock," "paper," or "scissors." The script uses a synthetic data renderer to create a large dataset of these images with various transformations and noise, eliminating the need for a massive real-world dataset.
12+
You will train a lightweight CNN to classify images of the letters R, P, and S as "rock," "paper," or "scissors." The script uses a synthetic data renderer to create a large dataset of these images with various transformations and noise, eliminating the need for a massive real-world dataset.
1413

1514
### What is a Convolutional Neural Network (CNN)?
1615
A Convolutional Neural Network (CNN) is a type of deep neural network primarily used for analyzing visual imagery. Unlike traditional neural networks, CNNs are designed to process pixel data by using a mathematical operation called **convolution**. This allows them to automatically and adaptively learn spatial hierarchies of features from input images, from low-level features like edges and textures to high-level features like shapes and objects.
1716

1817
![Image of a convolutional neural network architecture](image.png)
19-
20-
Image of a convolutional neural network architecture : [Image credits](https://medium.com/@atul_86537/learning-ml-from-first-principles-c-linux-the-rick-and-morty-way-convolutional-neural-c76c3df511f4).
18+
[Image credits](https://medium.com/@atul_86537/learning-ml-from-first-principles-c-linux-the-rick-and-morty-way-convolutional-neural-c76c3df511f4).
2119

2220
CNNs are the backbone of many modern computer vision applications, including:
2321

2422
- **Image Classification:** Identifying the main object in an image, like classifying a photo as a "cat" or "dog".
2523
- **Object Detection:** Locating specific objects within an image and drawing a box around them.
2624
- **Facial Recognition:** Identifying and verifying individuals based on their faces.
2725

28-
For our Rock-Paper-Scissors game, we'll use a tiny CNN to classify images of the letters R, P, and S as the corresponding hand gestures.
26+
For the Rock-Paper-Scissors game, you'll use a tiny CNN to classify images of the letters R, P, and S as the corresponding hand gestures.
2927

3028

3129

3230
## Environment Setup
33-
To get started, follow the first three chapters of the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) Learning Path. This will set up your development environment and install the necessary tools.
34-
31+
To get started, follow the first three chapters of the [Introduction to TinyML on Arm using PyTorch and ExecuTorch](/learning-paths/embedded-and-microcontrollers/introduction-to-tinyml-on-arm) Learning Path. This will set up your development environment and install the necessary tools. Return to this LP once you've run the `./examples/arm/run.sh` script in the ExecuTorch repository.
3532

3633
If you just followed the LP above, you should already have your virtual environment activated. If not, activate it using:
3734

@@ -43,7 +40,7 @@ The prompt of your terminal now has `(executorch-venv)` as a prefix to indicate
4340
Run the commands below to install the dependencies.
4441

4542
```bash
46-
pip install argparse json numpy pillow torch
43+
pip install argparse numpy pillow torch
4744
```
48-
You are now ready to build the model.
45+
You are now ready to create the model.
4946

content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/fine-tune-2.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ Navigate to the Arm examples directory in the ExecuTorch repository.
1414
cd $HOME/executorch/examples/arm
1515
```
1616

17-
Using a file editor of your choice, create a file named rps_tiny.py, copy and paste the code shown below:
17+
Using a file editor of your choice, create a file named `rps_tiny.py`, copy and paste the code shown below:
1818

1919
```python
2020
#!/usr/bin/env python3
@@ -252,7 +252,7 @@ def ascii_show(img: torch.Tensor) -> str:
252252
row=[]
253253
for x in range(0,w,1):
254254
v = arr[y, x]
255-
row.append(chars[min(len(chars)-1, v*len(chars)//256)])
255+
row.append(chars[min(len(chars)-1, int(v)*len(chars)//256)])
256256
lines.append("".join(row))
257257
return "\n".join(lines)
258258

@@ -369,16 +369,15 @@ if __name__ == "__main__":
369369
```
370370

371371

372-
### How This Script Works:
372+
### About the Script
373373
The script handles the entire workflow: data generation, model training, and a simple command-line game.
374374

375-
- **Synthetic Data Generation:** The script includes a function render_rps() that generates 28x28 grayscale images of the letters 'R', 'P', and 'S' with random rotations, blurs, and noise. This creates a diverse dataset that's used to train the model.
375+
- **Synthetic Data Generation:** The script includes a function `render_rps()` that generates 28x28 grayscale images of the letters 'R', 'P', and 'S' with random rotations, blurs, and noise. This creates a diverse dataset that's used to train the model.
376376
- **Model Architecture:** The model, a TinyRPS class, is a simple Convolutional Neural Network (CNN). It uses a series of 2D convolutional layers, followed by pooling layers to reduce spatial dimensions, and finally, fully connected linear layers to produce a final prediction. This architecture is efficient and well-suited for edge devices.
377-
- **Training:** The script generates synthetic training and validation datasets. It then trains the CNN model using the **Adam optimizer** and **Cross-Entropy Loss**. It tracks validation accuracy and saves the best-performing model to rps_best.pt.
378-
- **ExecuTorch Export:** A key part of the script is the export_to_pte() function. This function uses the torch.export module (or a fallback) to trace the trained PyTorch model and convert it into an ExecuTorch program (.pte). This compiled program is highly optimized for deployment on any target hardware. For self-practice, you can play around with Cortex-A or M devices.
377+
- **Training:** The script generates synthetic training and validation datasets. It then trains the CNN model using the **Adam optimizer** and **Cross-Entropy Loss**. It tracks validation accuracy and saves the best-performing model to `rps_best.pt`.
378+
- **ExecuTorch Export:** A key part of the script is the `export_to_pte()` function. This function uses the `torch.export module` (or a fallback) to trace the trained PyTorch model and convert it into an ExecuTorch program (`.pte`). This compiled program is highly optimized for deployment on any target hardware, for example Cortex-M or Cortex-A CPUs for embedded devices.
379379
- **CLI Mini-Game**: After training, you can play an interactive game. The script generates an image of your move and a random opponent's move. It then uses the trained model to classify both images and determines the winner based on the model's predictions.
380380

381-
382381
### Running the Script:
383382

384383
To train the model, export it, and play the game, run the following command:
@@ -389,7 +388,7 @@ python rps_tiny.py --epochs 8 --export --play
389388

390389
You'll see the training progress, where the model's accuracy rapidly improves on the synthetic data.
391390

392-
```bash
391+
```output
393392
== Building synthetic datasets ==
394393
Train size: 3000 | Val size: 600
395394
totl += float(loss)*x.size(0)
@@ -405,7 +404,7 @@ Loaded weights from rps_best.pt
405404
```
406405
After training and export, the game will start. Type rock, paper, or scissors and see the model's predictions and what your opponent played.
407406

408-
```bash
407+
```output
409408
=== Rock–Paper–Scissors: Play vs Tiny CNN ===
410409
Type one of: rock / paper / scissors / quit
411410
@@ -487,4 +486,5 @@ Model thinks opponent played: rock (100.0%)
487486
--------------------------------------------------
488487
Your move>
489488
```
490-
Type quit to exit the game. You can now prepare the model to run on the FVP in the next chapter.
489+
490+
Type `quit` to exit the game. In the next chapter, you'll prepare the model to run on the FVP.

content/learning-paths/embedded-and-microcontrollers/training-inference-pytorch/fvp-3.md

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -19,8 +19,7 @@ export ET_HOME=$HOME/executorch
1919
export executorch_DIR=$ET_HOME/build
2020
```
2121

22-
23-
Use the AOT Arm compiler to generate the optimized .pte file. This command delegates the model to the Ethos-U85 NPU, applies quantization to reduce model size and improve performance, and specifies the memory configuration. Run it from the ExecuTorch root directory.
22+
Use the AOT Arm compiler to generate the optimized `.pte` file. This command delegates the model to the Ethos-U85 NPU, applies quantization to reduce model size and improve performance, and specifies the memory configuration. Run it from the ExecuTorch root directory.
2423

2524
```bash
2625
cd $ET_HOME
@@ -35,12 +34,11 @@ You should see:
3534
PTE file saved as rps_tiny_arm_delegate_ethos-u85-128.pte
3635
```
3736

38-
Next, you'll build the Ethos-U runner, which is a bare-metal executable that includes the ExecuTorch runtime and your compiled model. This runner is what the FVP will execute. Navigate to the runner's directory and use CMake to configure the build.
37+
Next, you'll build the **Ethos-U runner**, which is a bare-metal executable that includes the ExecuTorch runtime and your compiled model. This runner is what the FVP will execute. Navigate to the runner's directory and use CMake to configure the build.
3938

4039
```bash
4140
cd $HOME/executorch/examples/arm/executor_runner
4241

43-
4442
cmake -DCMAKE_BUILD_TYPE=Release \
4543
-S "$ET_HOME/examples/arm/executor_runner" \
4644
-B "$ET_HOME/examples/arm/executor_runner/cmake-out" \
@@ -51,7 +49,7 @@ cmake -DCMAKE_BUILD_TYPE=Release \
5149
-DET_PTE_FILE_PATH="$ET_HOME/rps_tiny_arm_delegate_ethos-u85-128.pte" \
5250
-DETHOS_SDK_PATH="$ET_HOME/examples/arm/ethos-u-scratch/ethos-u" \
5351
-DETHOSU_TARGET_NPU_CONFIG=ethos-u85-128 \
54-
-DSYSTEM_CONFIG=Ethos_U85_SYS_DRAM_Mid \
52+
-DSYSTEM_CONFIG=Ethos_U85_SYS_DRAM_Mid
5553
```
5654

5755
You should see output similar to this, indicating a successful configuration:
@@ -76,11 +74,11 @@ cmake --build "$ET_HOME/examples/arm/executor_runner/cmake-out" -j --target arm_
7674
```
7775

7876
### Run the Model on the FVP
79-
With the arm_executor_runner executable ready, you can now run it on the Corstone-320 FVP to see the model on a simulated Arm device.
77+
With the `arm_executor_runner` executable ready, you can now run it on the Corstone-320 FVP to see the model on a simulated Arm device.
8078

8179
```bash
8280
FVP_Corstone_SSE-320 \
83-
-C mps4_board.subsystem.ethosu.num_macs=256 \
81+
-C mps4_board.subsystem.ethosu.num_macs=128 \
8482
-C mps4_board.visualisation.disable-visualisation=1 \
8583
-C vis_hdlcd.disable_visualisation=1 \
8684
-C mps4_board.telnetterminal0.start_telnet=0 \
@@ -90,9 +88,7 @@ FVP_Corstone_SSE-320 \
9088
```
9189

9290
{{% notice Note %}}
93-
9491
The argument `mps4_board.visualisation.disable-visualisation=1` disables the FVP GUI. This can speed up launch time for the FVP.
95-
9692
{{% /notice %}}
9793

9894

@@ -112,7 +108,10 @@ I [executorch:arm_executor_runner.cpp:563 main()] Setting up planned buffer 0, s
112108
I [executorch:EthosUBackend.cpp:116 init()] data:0x70000070
113109
```
114110

111+
{{% notice Note %}}
112+
The inference itself may take a longer to run with a model this size - note that this is not a reflection of actual execution time.
113+
{{%% /notice }}
115114

116-
Congratulations! You've successfully built, optimized, and deployed a computer vision model on a simulated Arm-based system. This hands-on exercise demonstrates the power and practicality of TinyML and ExecuTorch for resource-constrained devices.
115+
You've now successfully built, optimized, and deployed a computer vision model on a simulated Arm-based system. This hands-on exercise demonstrates the power and practicality of TinyML and ExecuTorch for resource-constrained devices.
117116

118117
In a future learning path, you can explore comparing different model performances and inference times before and after optimization. You could also analyze CPU and memory usage during inference, providing a deeper understanding of how the ExecuTorch framework optimizes your model for edge deployment.

0 commit comments

Comments
 (0)