|
| 1 | +--- |
| 2 | +title: Modify test workflow and compare performance |
| 3 | +weight: 5 |
| 4 | + |
| 5 | +### FIXED, DO NOT MODIFY |
| 6 | +layout: learningpathall |
| 7 | +--- |
| 8 | + |
| 9 | +Continuously monitoring the performance of your machine learning models in production is crucial to maintaining their effectiveness over time. The performance of your ML model can change due to various factors ranging from data-related issues to model-specific and environmental factors. |
| 10 | + |
| 11 | +In this section, you will change the PyTorch backend being used to test the trained model. You will learn how to measure and continuously monitor the inference performance with your workflow. |
| 12 | + |
| 13 | +## OneDNN with Arm Compute Library (ACL) |
| 14 | + |
| 15 | +In the previous section, you used the PyTorch 2.3.0 Docker Image compiled with OpenBLAS from DockerHub to run your testing workflow. PyTorch can be run with other backends as well. You will now modify the testing workflow to use PyTorch 2.3.0 Docker Image compiled with OneDNN and the Arm Compute Library. |
| 16 | + |
| 17 | +The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors, and the Mali GPUs. The Arm-hosted GitHub runners use Arm Neoverse CPUs, which makes it possible to optimize your neural networks to take advantange of the features available on the runners. ACL implements kernels (which you may know as operators or layers), which uses specific instructions that run faster on AArch64. |
| 18 | +ACL is integrated into PyTorch through the [oneDNN engine](https://github.com/oneapi-src/oneDNN). |
| 19 | + |
| 20 | +## Modify the test workflow and compare results |
| 21 | + |
| 22 | +Two different PyTorch docker images for Arm Neoverse CPUs are available on [DockerHub](https://hub.docker.com/r/armswdev/pytorch-arm-neoverse). Up until this point, you used the `r24.07-torch-2.3.0-openblas` container image in your workflows. You will now update `test_model.yml` to use the `r24.07-torch-2.3.0-onednn-acl` container image instead. |
| 23 | + |
| 24 | +Open and edit `.github/workflows/test_model.yml` in your browser. Update the `container.image` parameter to `armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl` and save the file: |
| 25 | + |
| 26 | +```yaml |
| 27 | +jobs: |
| 28 | + test-model: |
| 29 | + name: Test the Model |
| 30 | + runs-on: ubuntu-22.04-arm-os # Custom ARM64 runner |
| 31 | + container: |
| 32 | + image: armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl |
| 33 | + options: --user root |
| 34 | + # Steps omitted |
| 35 | +``` |
| 36 | + |
| 37 | +Trigger the Test Model job again by clicking the Run workflow button on the Actions tab. |
| 38 | + |
| 39 | +Expand the Run testing script step from your Actions tab. You should see a change in the performance results with OneDNN and ACL kernels being used. |
| 40 | + |
| 41 | +```output |
| 42 | +Accuracy of the model on the test images: 90.48% |
| 43 | +--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ |
| 44 | + Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls |
| 45 | +--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ |
| 46 | + model_inference 4.63% 304.000us 100.00% 6.565ms 6.565ms 1 |
| 47 | + aten::conv2d 0.18% 12.000us 56.92% 3.737ms 1.869ms 2 |
| 48 | + aten::convolution 0.30% 20.000us 56.74% 3.725ms 1.863ms 2 |
| 49 | + aten::_convolution 0.43% 28.000us 56.44% 3.705ms 1.853ms 2 |
| 50 | + aten::mkldnn_convolution 47.02% 3.087ms 55.48% 3.642ms 1.821ms 2 |
| 51 | + aten::max_pool2d 0.15% 10.000us 25.51% 1.675ms 837.500us 2 |
| 52 | + aten::max_pool2d_with_indices 25.36% 1.665ms 25.36% 1.665ms 832.500us 2 |
| 53 | + aten::linear 0.18% 12.000us 9.26% 608.000us 304.000us 2 |
| 54 | + aten::clone 0.26% 17.000us 9.08% 596.000us 149.000us 4 |
| 55 | + aten::addmm 8.50% 558.000us 8.71% 572.000us 286.000us 2 |
| 56 | +--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ |
| 57 | +Self CPU time total: 6.565ms |
| 58 | +
|
| 59 | +``` |
| 60 | +For the ACL results, observe that the **Self CPU time total** is lower compared to the OpenBLAS run in the previous section. The names of the layers have changed as well, where the `aten::mkldnn_convolution` is the kernel optimized to run on Aarch64. That operator is the main reason our inference time is improved, made possible by using ACL kernels. |
| 61 | + |
| 62 | +In the next section, you will learn how to automate the deployment of your trained and tested model. |
0 commit comments