You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this Learning Path, you will learn how to automate your MLOps workflow using an Arm-hosted GitHub runner and GitHub Actions. You will learn how to train and test a neural network model with PyTorch. You will compare the model inference time for your trained model using two different PyTorch backends. You will then containerize your trained model and deploy the container image to DockerHub for easy deployment of your application.
11
+
In this Learning Path, you will learn how to automate an MLOps workflow using Arm-hosted GitHub runners and GitHub Actions.
12
+
13
+
You will learn how to do the following tasks:
14
+
- Train and test a neural network model with PyTorch.
15
+
- Compare the model inference time using two different PyTorch backends.
16
+
- Containerize the model and save it to DockerHub.
17
+
- Deploy the container image and use API calls to access the model.
12
18
13
19
## GitHub Actions
14
20
15
-
GitHub Actions is a platform that automates software development workflows, including continuous integration and continuous delivery. Every repository on GitHub has a tab named _Actions_.
21
+
GitHub Actions is a platform that automates software development workflows, including continuous integration and continuous delivery. Every repository on GitHub has an `Actions`tab as shown below:
16
22
17
23

18
24
19
-
From here, you can run different _workflow files_ which automate processes that run when specific events occur in your GitHub code repository. You use [YAML](https://yaml.org/) to define a workflow. You specify how a job is triggered, the running environment, and the workflow commands. The machine on which the workflow runs is called a _runner_.
25
+
GitHub Actions runs workflow files to automate processes. Workflows run when specific events occur in a GitHub repository.
26
+
27
+
[YAML](https://yaml.org/) defines a workflow.
28
+
29
+
Workflows specify how a job is triggered, the running environment, and the commands to run.
30
+
31
+
The machine running workflows is called a _runner_.
20
32
21
33
## Arm-hosted GitHub runners
22
34
23
-
Arm-hosted GitHub runners are a powerful addition to your CI/CD toolkit. They leverage the efficiency and performance of Arm64 architecture, making your build systems faster and easier to scale. By using the Arm-hosted GitHub runners, you can optimize your workflows, reduce costs, and improve energy consumption. Additionally, the Arm-hosted runners are preloaded with essential tools, making it easier for you to develop and test your applications.
35
+
Hosted GitHub runners are provided by GitHub so you don't need to setup and manage cloud infrastructure. Arm-hosted GitHub runners use the Arm architecture so you can build and test software without cross-compiling or instruction emulation.
36
+
37
+
Arm-hosted GitHub runners enable you to optimize your workflows, reduce cost, and improve energy consumption.
38
+
39
+
Additionally, the Arm-hosted runners are preloaded with essential tools, making it easier for you to develop and test your applications.
24
40
25
41
Arm-hosted runners are available for Linux and Windows. This Learning Path uses Linux.
26
42
27
43
{{% notice Note %}}
28
44
You must have a Team or Enterprise Cloud plan to use Arm-hosted runners.
29
45
{{% /notice %}}
30
46
31
-
Getting started with Arm-hosted GitHub runners is straightforward. Follow [these steps to create a Linux Arm-hosted runner within your organization](/learning-paths/cross-platform/github-arm-runners/runner/#how-can-i-create-an-arm-hosted-runner).
47
+
Getting started with Arm-hosted GitHub runners is straightforward. Follow the steps in [Create a new Arm-hosted runner](/learning-paths/cross-platform/github-arm-runners/runner/#how-can-i-create-an-arm-hosted-runner) to create a runner in your organization.
32
48
33
-
Once you have created the runner within your organization, you can use the `runs-on` syntax in your GitHub Actions workflow file to execute the workflow on Arm. Shown here is an example workflow that executes on your Arm-hosted runner named `ubuntu-22.04-arm`:
49
+
Once you have created the runner, use the `runs-on` syntax in your GitHub Actions workflow file to execute the workflow on Arm.
50
+
51
+
Below is an example workflow that executes on an Arm-hosted runner named `ubuntu-22.04-arm-os`:
34
52
35
53
```yaml
36
54
name: Example workflow
@@ -45,14 +63,25 @@ jobs:
45
63
run: echo "This line runs on Arm!"
46
64
```
47
65
48
-
This setup allows you to take full advantage of the Arm64 architecture's capabilities. Whether you are working on cloud, edge, or automotive projects, these runners provide a versatile and robust solution.
49
66
50
67
## Machine Learning Operations (MLOps)
51
68
52
-
With machine learning use-cases evolving and scaling, comes an increased need for reliable workflows to maintain them. There are many regular tasks that can be automated in the ML lifecycle. Models need to be re-trained, while ensuring they still perform at their best capacity. New training data needs to be properly stored and pre-processed, and the models need to be deployed in a good production environment. Developer Operations (DevOps) refers to good practices for CI/CD. The domain-specific needs for ML, combined with state of the art DevOps knowledge, created the term MLOps.
69
+
Machine learning use-cases have a need for reliable workflows to maintain performance and quality.
70
+
71
+
There are many tasks that can be automated in the ML lifecycle.
72
+
- Model training and re-training
73
+
- Model performance analysis
74
+
- Data storage and processing
75
+
- Model deployment
76
+
77
+
Developer Operations (DevOps) refers to good practices for collaboration and automation, including CI/CD. The domain-specific needs for ML, combined with DevOps knowledge, creates the new term MLOps.
53
78
54
79
## German Traffic Sign Recognition Benchmark (GTSRB)
55
80
56
-
In this Learning path, you will train and test a PyTorch model for use in Traffic Sign recognition. You will use the GTSRB dataset to train the model. The dataset is free to use under the [Creative Commons](https://creativecommons.org/publicdomain/zero/1.0/) license. It contains thousands of images of traffic signs found in Germany. Thanks to the availability and real-world connection, it has become a well-known resource to showcase ML applications. Additionally, given that it is a benchmark, you can apply it in a MLOps context to compare model improvements. This makes it a great candidate for this Learning Path, where you compare the performance of your trained model using two different PyTorch backends.
81
+
This Learning Path explains how to train and test a PyTorch model to perform traffic sign recognition.
82
+
83
+
You will learn how to use the GTSRB dataset to train the model. The dataset is free to use under the [Creative Commons](https://creativecommons.org/publicdomain/zero/1.0/) license. It contains thousands of images of traffic signs found in Germany. It has become a well-known resource to showcase ML applications.
84
+
85
+
The GTSRB dataset is also good for comparing performance and accuracy of different models and to compare and contrast different PyTorch backends.
57
86
58
-
Now that you have an overview, in the following sections you will learn how to setup an end-to-end MLOps workflow using the Arm-hosted GitHub runners.
87
+
Continue to the next section to learn how to setup an end-to-end MLOps workflow using Arm-hosted GitHub runners.
Copy file name to clipboardExpand all lines: content/learning-paths/servers-and-cloud-computing/gh-runners/compare-performance.md
+30-13Lines changed: 30 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,27 +1,34 @@
1
1
---
2
-
title: Modify test workflow and compare performance
2
+
title: Compare the performance of PyTorch backends
3
3
weight: 5
4
4
5
5
### FIXED, DO NOT MODIFY
6
6
layout: learningpathall
7
7
---
8
8
9
-
Continuously monitoring the performance of your machine learning models in production is crucial to maintaining their effectiveness over time. The performance of your ML model can change due to various factors ranging from data-related issues to model-specific and environmental factors.
9
+
Continuously monitoring the performance of your machine learning models in production is crucial to maintaining effectiveness over time. The performance of your ML model can change due to various factors ranging from data-related issues to environmental factors.
10
10
11
-
In this section, you will change the PyTorch backend being used to test the trained model. You will learn how to measure and continuously monitor the inference performance with your workflow.
11
+
In this section, you will change the PyTorch backend being used to test the trained model. You will learn how to measure and continuously monitor the inference performance using your workflow.
12
12
13
13
## OneDNN with Arm Compute Library (ACL)
14
14
15
-
In the previous section, you used the PyTorch 2.3.0 Docker Image compiled with OpenBLAS from DockerHub to run your testing workflow. PyTorch can be run with other backends as well. You will now modify the testing workflow to use PyTorch 2.3.0 Docker Image compiled with OneDNN and the Arm Compute Library.
15
+
In the previous section, you used the PyTorch 2.3.0 Docker Image compiled with OpenBLAS from DockerHub to run your testing workflow. PyTorch can be run with other backends. You will now modify the testing workflow to use PyTorch 2.3.0 Docker Image compiled with OneDNN and the Arm Compute Library.
16
16
17
-
The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors, and the Mali GPUs. The Arm-hosted GitHub runners use Arm Neoverse CPUs, which makes it possible to optimize your neural networks to take advantange of the features available on the runners. ACL implements kernels (which you may know as operators or layers), which uses specific instructions that run faster on AArch64.
18
-
ACL is integrated into PyTorch through the [oneDNN engine](https://github.com/oneapi-src/oneDNN).
17
+
The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors and Mali GPUs. Arm-hosted GitHub runners use Arm Neoverse CPUs, which make it possible to optimize your neural networks to take advantage of processor features. ACL implements kernels (also known as operators or layers), using specific instructions that run faster on AArch64.
18
+
19
+
ACL is integrated into PyTorch through [oneDNN](https://github.com/oneapi-src/oneDNN), an open-source deep neural network library.
19
20
20
21
## Modify the test workflow and compare results
21
22
22
-
Two different PyTorch docker images for Arm Neoverse CPUs are available on [DockerHub](https://hub.docker.com/r/armswdev/pytorch-arm-neoverse). Up until this point, you used the `r24.07-torch-2.3.0-openblas` container image in your workflows. You will now update `test_model.yml` to use the `r24.07-torch-2.3.0-onednn-acl` container image instead.
23
+
Two different PyTorch docker images for Arm Neoverse CPUs are available on [DockerHub](https://hub.docker.com/r/armswdev/pytorch-arm-neoverse).
24
+
25
+
Up until this point, you used the `r24.07-torch-2.3.0-openblas` container image to run workflows. The oneDNN container image is also available to use in workflows. These images represent two different PyTorch backends which handle the PyTorch model execution.
26
+
27
+
### Change the Docker image to use oneDNN
23
28
24
-
Open and edit `.github/workflows/test_model.yml` in your browser. Update the `container.image` parameter to `armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl` and save the file:
29
+
In your browser, open and edit the file `.github/workflows/test_model.yml`.
30
+
31
+
Update the `container.image` parameter to `armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl` and save the file by committing the change to the main branch:
25
32
26
33
```yaml
27
34
jobs:
@@ -34,9 +41,17 @@ jobs:
34
41
# Steps omitted
35
42
```
36
43
37
-
Trigger the Test Model job again by clicking the Run workflow button on the Actions tab.
44
+
### Run the test workflow
45
+
46
+
Trigger the **Test Model** job again by clicking the `Run workflow` button on the `Actions` tab.
47
+
48
+
The test workflow starts running.
38
49
39
-
Expand the Run testing script step from your Actions tab. You should see a change in the performance results with OneDNN and ACL kernels being used.
50
+
Navigate to the workflow run on the `Actions` tab, click into the job, and expand the **Run testing script** step.
51
+
52
+
You see a change in the performance results with OneDNN and ACL kernels being used.
53
+
54
+
The output is similar to:
40
55
41
56
```output
42
57
Accuracy of the model on the test images: 90.48%
@@ -55,8 +70,10 @@ Accuracy of the model on the test images: 90.48%
For the ACL results, observe that the **Self CPU time total** is lower compared to the OpenBLAS run in the previous section. The names of the layers have changed as well, where the `aten::mkldnn_convolution` is the kernel optimized to run on Aarch64. That operator is the main reason our inference time is improved, made possible by using ACL kernels.
61
74
62
-
In the next section, you will learn how to automate the deployment of your trained and tested model.
75
+
For the ACL results, notice that the **Self CPU time total** is lower compared to the OpenBLAS run in the previous section.
76
+
77
+
The names of the layers have also changed, where the `aten::mkldnn_convolution` is the kernel optimized to run on the Arm architecture. That operator is the main reason the inference time is improved, made possible by using ACL kernels.
78
+
79
+
In the next section, you will learn how to automate the deployment of your model.
0 commit comments