You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:::{grid-item-card} What you will learn in this tutorial:
15
15
:class-card: card-prerequisites
16
-
In this tutorial you will learn how to export a simple PyTorch model for ExecuTorch Arm Ethos-u backend delegate and run it on a Corstone-300 FVP Simulator.
16
+
In this tutorial you will learn how to export a simple PyTorch model for ExecuTorch Arm Ethos-u backend delegate and run it on a Corstone FVP Simulators.
17
17
:::
18
18
19
19
::::
@@ -34,9 +34,9 @@ Let's make sure you have everything you need before we get started.
34
34
35
35
To successfully complete this tutorial, you will need a Linux-based host machine with Arm aarch64 or x86_64 processor architecture.
36
36
37
-
The target device will be an embedded platform with an Arm Cortex-M55 CPU and Ethos-U55 NPU (ML processor). This tutorial will show you how to run PyTorch models on both.
37
+
The target device will be an embedded platform with an Arm Cortex-M CPUs and Ethos-U NPUs (ML processor). This tutorial will show you how to run PyTorch models on both.
38
38
39
-
We will be using a [Fixed Virtual Platform (FVP)](https://www.arm.com/products/development-tools/simulation/fixed-virtual-platforms), simulating a [Corstone-300](https://developer.arm.com/Processors/Corstone-300)(cs300) system. Since we will be using the FVP (think of it as virtual hardware), we won't be requiring any real embedded hardware for this tutorial.
39
+
We will be using a [Fixed Virtual Platform (FVP)](https://www.arm.com/products/development-tools/simulation/fixed-virtual-platforms), simulating [Corstone-300](https://developer.arm.com/Processors/Corstone-300)(cs300) and [Corstone-320](https://developer.arm.com/Processors/Corstone-320)(cs320)systems. Since we will be using the FVP (think of it as virtual hardware), we won't be requiring any real embedded hardware for this tutorial.
40
40
41
41
### Software
42
42
@@ -64,19 +64,19 @@ uname -m
64
64
65
65
Next we will walk through the steps performed by the `setup.sh` script to better understand the development setup.
66
66
67
-
### Download and Set Up the Corstone-300 FVP
67
+
### Download and Set Up the Corstone-300 and Corstone-320 FVP
68
68
69
-
Fixed Virtual Platforms (FVPs) are pre-configured, functionally accurate simulations of popular system configurations. Here in this tutorial, we are interested in the Corstone-300 system. We can download this from the Arm website.
69
+
Fixed Virtual Platforms (FVPs) are pre-configured, functionally accurate simulations of popular system configurations. Here in this tutorial, we are interested in Corstone-300 and Corstone-320 systems. We can download this from the Arm website.
70
70
71
71
```{note}
72
72
By downloading and running the FVP software, you will be agreeing to the FVP [End-user license agreement (EULA)](https://developer.arm.com/downloads/-/arm-ecosystem-fvps/eula).
73
73
```
74
74
75
-
To download, we can either download `Corstone-300 Ecosystem FVP` from [here](https://developer.arm.com/downloads/-/arm-ecosystem-fvps). or `setup.sh` script will does that for you under `setup_fvp` function.
75
+
To download, we can either download `Corstone-300 Ecosystem FVP`and `Corstone-320 Ecosystem FVP`from [here](https://developer.arm.com/downloads/-/arm-ecosystem-fvps). or `setup.sh` script does that for you under `setup_fvp` function.
76
76
77
77
### Download and Install the Arm GNU AArch32 Bare-Metal Toolchain
78
78
79
-
Similar to the FVP, we would also need a tool-chain to cross-compile ExecuTorch runtime, executor-runner bare-metal application, as well as the rest of the bare-metal stack for Cortex-M55 CPU available on the Corstone-300 platform.
79
+
Similar to the FVP, we would also need a tool-chain to cross-compile ExecuTorch runtime, executor-runner bare-metal application, as well as the rest of the bare-metal stack for Cortex-M55/M85 CPU available on the Corstone-300/Corstone-320 platform.
80
80
81
81
These toolchains are available [here](https://developer.arm.com/downloads/-/arm-gnu-toolchain-downloads). We will be using GCC 12.3 targeting `arm-none-eabi` here for our tutorial. Just like FVP, `setup.sh` script will down the toolchain for you. See `setup_toolchain` function.
82
82
@@ -103,10 +103,14 @@ At the end of the setup, if everything goes well, your top level devlopement dir
After the `quantized_ops_aot_lib` build, we can run the following script to generate the `.pte` file
@@ -257,7 +260,7 @@ At the end of this, we should have three different `.pte` files.
257
260
- The second one contains the [AddModule](#addmodule), with Arm Ethos-U backend delegate enabled.
258
261
- The third one contains the [quantized MV2Model](#mv2module), with the Arm Ethos-U backend delegate enabled as well.
259
262
260
-
Now let's try to run these `.pte` files on a Corstone-300 platform in a bare-metal environment.
263
+
Now let's try to run these `.pte` files on a Corstone-300 and Corstone-320 platforms in a bare-metal environment.
261
264
262
265
## Getting a Bare-Metal Executable
263
266
@@ -269,9 +272,13 @@ The block diagram below demonstrates, at the high level, how the various build a
269
272
270
273

271
274
275
+
```{tip}
276
+
The `generate_pte_file` function in `run.sh` script produces the `.pte` files based on the models provided through `--model_name` input argument
277
+
```
278
+
272
279
### Generating ExecuTorch Libraries
273
280
274
-
ExecuTorch's CMake build system produces a set of build pieces which are critical for us to include and run the ExecuTorch runtime with-in the bare-metal environment we have for Corstone-300 from Ethos-U SDK.
281
+
ExecuTorch's CMake build system produces a set of build pieces which are critical for us to include and run the ExecuTorch runtime with-in the bare-metal environment we have for Corstone FVPs from Ethos-U SDK.
275
282
276
283
[This](./runtime-build-and-cross-compilation.md) document provides a detailed overview of each individual build piece. For running either variant of the `.pte` file, we will need a core set of libraries. Here is a list,
277
284
@@ -283,133 +290,106 @@ To run a `.pte` file with the Arm backend delegate call instructions, we will ne
283
290
284
291
-`libexecutorch_delegate_ethos_u.a`
285
292
286
-
287
-
These libraries are generated in `build_executorch` function of the `run.sh` script.
293
+
These libraries are generated in `build_executorch` and `build_quantization_aot_lib` function of the `run.sh` script.
288
294
289
295
In this function, `EXECUTORCH_SELECT_OPS_LIST` will decide the number of portable operators included in the build and are available at runtime. It must match with `.pte` file's requirements, otherwise you will get `Missing Operator` error at runtime.
290
296
291
297
For example, there in the command line above, to run SoftmaxModule, we only included the softmax CPU operator. Similarly, to run AddModule in a non-delegated manner you will need add op and so on. As you might have already realized, for the delegated operators, which will be executed by the Arm backend delegate, we do not need to include those operators in this list. This is only for *non-delegated* operators.
292
298
299
+
```{tip}
300
+
The `run.sh` script takes in `--portable_kernels` option, which provides a way to supply a comma seperated list of portable kernels to be included.
301
+
```
302
+
293
303
### Building the executor_runner Bare-Metal Application
294
304
295
305
The SDK dir is the same one prepared [earlier](#setup-the-arm-ethos-u-software-development). And, we will be passing the `.pte` file (any one of them) generated above.
296
306
297
-
Note, you have to generate a new `executor-runner` binary if you want to change the model or the `.pte` file. This constraint is from the constrained bare-metal runtime environment we have for Corstone-300 platform.
307
+
Note, you have to generate a new `executor-runner` binary if you want to change the model or the `.pte` file. This constraint is from the constrained bare-metal runtime environment we have for Corstone-300/Corstone-320 platforms.
298
308
299
309
This is performed by the `build_executorch_runner` function in `run.sh`.
300
310
301
-
## Running on Corstone-300 FVP Platform
311
+
```{tip}
312
+
The `run.sh` script takes in `--target` option, which provides a way to provide a specific target, Corstone-300(ethos-u55-128) or Corstone-320(ethos-u85-128)
313
+
```
314
+
315
+
## Running on Corstone FVP Platforms
302
316
303
-
Once the elf is prepared, regardless of the `.pte` file variant is used to generate the bare metal elf, you can run in with following command,
317
+
Once the elf is prepared, regardless of the `.pte` file variant is used to generate the bare metal elf. The below command is used to run the [MV2Model](#mv2module) on Corstone-320 FVP
--timelimit 10 # seconds- after which sim will kill itself
333
+
--timelimit 120 ||true# seconds- after which sim will kill itself
317
334
```
318
335
319
336
If successful, the simulator should produce something like the following on the shell,
320
337
321
338
```console
322
-
Ethos-U rev 136b7d75 --- Apr 12 2023 13:44:01
323
-
(C) COPYRIGHT 2019-2023 Arm Limited
324
-
ALL RIGHTS RESERVED
325
-
326
-
I executorch:runner.cpp:64] Model PTE file loaded. Size: 960 bytes.
327
-
I executorch:runner.cpp:70] Model buffer loaded, has 1 methods
328
-
I executorch:runner.cpp:78] Running method forward
329
-
I executorch:runner.cpp:95] Setting up planned buffer 0, size 32.
330
-
I executorch:runner.cpp:110] Method loaded.
331
-
I executorch:runner.cpp:112] Preparing inputs...
332
-
I executorch:runner.cpp:114] Input prepared.
333
-
I executorch:runner.cpp:116] Starting the model execution...
334
-
I executorch:runner.cpp:121] Model executed successfully.
335
-
I executorch:runner.cpp:125] 1 outputs:
336
-
Output[0][0]: 0.500000
337
-
Output[0][1]: 0.500000
338
-
Output[0][2]: 0.500000
339
-
Output[0][3]: 0.500000
340
-
Application exit code: 0.
341
-
342
-
EXITTHESIM
343
-
344
-
Info: Simulation is stopping. Reason: CPU time has been exceeded.
345
-
```
346
-
347
-
Here in this example, we ran the `executor_runner` binary with the `softmax.pte` file generated for the [SoftmaxModule](#softmaxmodule), we do see the expected results generated from the baremetal binary running on the Corstone-300 virtual hardware on FVP simulator.
348
-
349
-
If you rerun the same FVP command with the delegated `.pte` file for the [AddModule](#addmodule), i.e. `add_arm_delegate.pte` - you may get something like following, again the expected results. Pay attention to the messages printed with prefix `ArmBackend::`, they indicate that the backend was sucecssfully initialized and the `add` operator from our AddModule in the `.pte` was exexuted on the Ethos-U55 NPU.
350
-
351
-
```console
352
-
Ethos-U rev 136b7d75 --- Apr 12 2023 13:44:01
353
-
(C) COPYRIGHT 2019-2023 Arm Limited
354
-
ALL RIGHTS RESERVED
355
-
356
-
I executorch:runner.cpp:64] Model PTE file loaded. Size: 2208 bytes.
357
-
I executorch:runner.cpp:70] Model buffer loaded, has 1 methods
358
-
I executorch:runner.cpp:78] Running method forward
359
-
I executorch:runner.cpp:95] Setting up planned buffer 0, size 64.
360
-
I executorch:ArmBackendEthosU.cpp:51] ArmBackend::init 0x11000050
361
-
I executorch:runner.cpp:110] Method loaded.
362
-
I executorch:runner.cpp:112] Preparing inputs...
363
-
I executorch:runner.cpp:114] Input prepared.
364
-
I executorch:runner.cpp:116] Starting the model execution...
365
-
I executorch:ArmBackendEthosU.cpp:103] ArmBackend::execute 0x11000050
366
-
I executorch:runner.cpp:121] Model executed successfully.
367
-
I executorch:runner.cpp:125] 1 outputs:
368
-
Output[0][0]: 2
369
-
Output[0][1]: 2
370
-
Output[0][2]: 2
371
-
Output[0][3]: 2
372
-
Output[0][4]: 2
373
-
Application exit code: 0.
374
-
375
-
EXITTHESIM
376
-
377
-
Info: Simulation is stopping. Reason: CPU time has been exceeded.
378
-
```
379
-
380
-
Similarily we can get the following output for running the [MV2Model](#mv2module)
381
-
382
-
```
383
-
Ethos-U rev 136b7d75 --- Apr 12 2023 13:44:01
384
-
(C) COPYRIGHT 2019-2023 Arm Limited
385
-
ALL RIGHTS RESERVED
386
-
387
-
I executorch:arm_executor_runner.cpp:60] Model in 0x70000000 $
388
-
I executorch:arm_executor_runner.cpp:66] Model PTE file loaded. Size: 4556832 bytes.
389
-
I executorch:arm_executor_runner.cpp:77] Model buffer loaded, has 1 methods
390
-
I executorch:arm_executor_runner.cpp:85] Running method forward
391
-
I executorch:arm_executor_runner.cpp:109] Setting up planned buffer 0, size 752640.
392
-
I executorch:ArmBackendEthosU.cpp:49] ArmBackend::init 0x70000060
393
-
I executorch:arm_executor_runner.cpp:130] Method loaded.
394
-
I executorch:arm_executor_runner.cpp:132] Preparing inputs...
395
-
I executorch:arm_executor_runner.cpp:141] Input prepared.
396
-
I executorch:arm_executor_runner.cpp:143] Starting the model execution...
397
-
I executorch:ArmBackendEthosU.cpp:87] ArmBackend::execute 0x70000060
398
-
I executorch:ArmBackendEthosU.cpp:234] Tensor input 0 will be permuted
339
+
I [executorch:arm_executor_runner.cpp:364] Model in 0x70000000 $
340
+
I [executorch:arm_executor_runner.cpp:366] Model PTE file loaded. Size: 4425968 bytes.
341
+
I [executorch:arm_executor_runner.cpp:376] Model buffer loaded, has 1 methods
342
+
I [executorch:arm_executor_runner.cpp:384] Running method forward
343
+
I [executorch:arm_executor_runner.cpp:395] Setup Method allocator pool. Size: 62914560 bytes.
344
+
I [executorch:arm_executor_runner.cpp:412] Setting up planned buffer 0, size 752640.
345
+
I [executorch:ArmBackendEthosU.cpp:79] ArmBackend::init 0x70000070
346
+
I [executorch:arm_executor_runner.cpp:445] Method loaded.
347
+
I [executorch:arm_executor_runner.cpp:447] Preparing inputs...
348
+
I [executorch:arm_executor_runner.cpp:461] Input prepared.
349
+
I [executorch:arm_executor_runner.cpp:463] Starting the model execution...
350
+
I [executorch:ArmBackendEthosU.cpp:118] ArmBackend::execute 0x70000070
351
+
I [executorch:ArmBackendEthosU.cpp:298] Tensor input/output 0 will be permuted
352
+
I [executorch:arm_perf_monitor.cpp:120] NPU Inferences : 1
353
+
I [executorch:arm_perf_monitor.cpp:121] Profiler report, CPU cycles per operator:
354
+
I [executorch:arm_perf_monitor.cpp:125] ethos-u : cycle_cnt : 1498202 cycles
355
+
I [executorch:arm_perf_monitor.cpp:132] Operator(s) total: 1498202 CPU cycles
356
+
I [executorch:arm_perf_monitor.cpp:138] Inference runtime: 6925114 CPU cycles total
357
+
I [executorch:arm_perf_monitor.cpp:140] NOTE: CPU cycle values and ratio calculations require FPGA and identical CPU/NPU frequency
358
+
I [executorch:arm_perf_monitor.cpp:149] Inference CPU ratio: 99.99 %
359
+
I [executorch:arm_perf_monitor.cpp:153] Inference NPU ratio: 0.01 %
360
+
I [executorch:arm_perf_monitor.cpp:162] cpu_wait_for_npu_cntr : 729 CPU cycles
361
+
I [executorch:arm_perf_monitor.cpp:167] Ethos-U PMU report:
362
+
I [executorch:arm_perf_monitor.cpp:168] ethosu_pmu_cycle_cntr : 5920305
363
+
I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr0 : 359921
364
+
I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr1 : 0
365
+
I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr2 : 0
366
+
I [executorch:arm_perf_monitor.cpp:171] ethosu_pmu_cntr3 : 503
367
+
I [executorch:arm_perf_monitor.cpp:178] Ethos-U PMU Events:[ETHOSU_PMU_EXT0_RD_DATA_BEAT_RECEIVED, ETHOSU_PMU_EXT1_RD_DATA_BEAT_RECEIVED, ETHOSU_PMU_EXT0_WR_DATA_BEAT_WRITTEN, ETHOSU_PMU_NPU_IDLE]
368
+
I [executorch:arm_executor_runner.cpp:470] model_pte_loaded_size: 4425968 bytes.
I executorch:arm_executor_runner.cpp:152] Model executed successfully.
400
376
I executorch:arm_executor_runner.cpp:156] 1 outputs:
401
-
Output[0][0]: -0.639322
402
-
Output[0][1]: 0.169232
403
-
Output[0][2]: -0.451286
377
+
Output[0][0]: -0.749744
378
+
Output[0][1]: -0.019224
379
+
Output[0][2]: 0.134570
404
380
...(Skipped)
405
-
Output[0][996]: 0.150429
406
-
Output[0][997]: -0.488894
407
-
Output[0][998]: 0.037607
408
-
Output[0][999]: 1.203430
381
+
Output[0][996]: -0.230691
382
+
Output[0][997]: -0.634399
383
+
Output[0][998]: -0.115345
384
+
Output[0][999]: 1.576386
409
385
I executorch:arm_executor_runner.cpp:177] Program complete, exiting.
410
386
I executorch:arm_executor_runner.cpp:179]
411
387
```
412
388
389
+
```{note}
390
+
The `run.sh` script provides various options to select a particular FVP target, use desired models, select portable kernels and can be explored using the `--help` argument
391
+
```
392
+
413
393
## Takeaways
414
394
Through this tutorial we've learnt how to use the ExecuTorch software to both export a standard model from PyTorch and to run it on the compact and fully functioned ExecuTorch runtime, enabling a smooth path for offloading models from PyTorch to Arm based platforms.
0 commit comments