Skip to content

Commit 0991301

Browse files
Merge pull request #1338 from jasonrandrews/review
Review MLOps on GitHub Actions Learning Path
2 parents 09f4213 + fbb594b commit 0991301

File tree

8 files changed

+311
-107
lines changed

8 files changed

+311
-107
lines changed

content/learning-paths/servers-and-cloud-computing/gh-runners/_index.md

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,30 @@
11
---
22
title: MLOps with Arm-hosted GitHub Runners
3+
draft: true
4+
cascade:
5+
draft: true
36

47
minutes_to_complete: 30
58

69
who_is_this_for: This is an introductory topic for software developers interested in automation for machine learning (ML) tasks.
710

811
learning_objectives:
9-
- Set up an Arm-hosted GitHub runner
10-
- Train and test a PyTorch ML model with the German Traffic Sign Recognition Benchmark (GTSRB) dataset on Arm
11-
- Use PyTorch compiled with OpenBLAS and oneDNN with Arm Compute Library to compare the performance of your trained model
12-
- Containerize the model and push your container to DockerHub
13-
- Automate all the steps in the ML workflow using GitHub Actions
14-
12+
- Set up an Arm-hosted GitHub runner.
13+
- Train and test a PyTorch ML model with the German Traffic Sign Recognition Benchmark (GTSRB) dataset.
14+
- Use PyTorch compiled with OpenBLAS and oneDNN with Arm Compute Library to compare the performance of a trained model.
15+
- Containerize the model and push the container to DockerHub.
16+
- Automate all the steps in the ML workflow using GitHub Actions.
1517

1618
prerequisites:
17-
- A GitHub account with access to Arm-hosted GitHub runners
18-
- Some familiarity with ML and continuous integration and deployment (CI/CD) concepts is assumed
19+
- A GitHub account with access to Arm-hosted GitHub runners.
20+
- A Docker Hub account for storing container images.
21+
- Some familiarity with ML and continuous integration and deployment (CI/CD) concepts.
1922

2023
author_primary: Pareena Verma, Annie Tallund
2124

2225
### Tags
2326
skilllevels: Introductory
24-
subjects: CI/CD
27+
subjects: CI-CD
2528
armips:
2629
- Neoverse
2730
tools_software_languages:

content/learning-paths/servers-and-cloud-computing/gh-runners/_review.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ review:
2424
2525
- questions:
2626
question: >
27-
ACL is integrated into PyTorch by default.
27+
ACL is included in PyTorch.
2828
answers:
2929
- "True"
3030
- "False"
Lines changed: 40 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Background
2+
title: MLOps background
33
weight: 2
44

55
### FIXED, DO NOT MODIFY
@@ -8,29 +8,47 @@ layout: learningpathall
88

99
## Overview
1010

11-
In this Learning Path, you will learn how to automate your MLOps workflow using an Arm-hosted GitHub runner and GitHub Actions. You will learn how to train and test a neural network model with PyTorch. You will compare the model inference time for your trained model using two different PyTorch backends. You will then containerize your trained model and deploy the container image to DockerHub for easy deployment of your application.
11+
In this Learning Path, you will learn how to automate an MLOps workflow using Arm-hosted GitHub runners and GitHub Actions.
12+
13+
You will learn how to do the following tasks:
14+
- Train and test a neural network model with PyTorch.
15+
- Compare the model inference time using two different PyTorch backends.
16+
- Containerize the model and save it to DockerHub.
17+
- Deploy the container image and use API calls to access the model.
1218

1319
## GitHub Actions
1420

15-
GitHub Actions is a platform that automates software development workflows, including continuous integration and continuous delivery. Every repository on GitHub has a tab named _Actions_.
21+
GitHub Actions is a platform that automates software development workflows, including continuous integration and continuous delivery. Every repository on GitHub has an `Actions` tab as shown below:
1622

1723
![#actions-gui](images/actions-gui.png)
1824

19-
From here, you can run different _workflow files_ which automate processes that run when specific events occur in your GitHub code repository. You use [YAML](https://yaml.org/) to define a workflow. You specify how a job is triggered, the running environment, and the workflow commands. The machine on which the workflow runs is called a _runner_.
25+
GitHub Actions runs workflow files to automate processes. Workflows run when specific events occur in a GitHub repository.
26+
27+
[YAML](https://yaml.org/) defines a workflow.
28+
29+
Workflows specify how a job is triggered, the running environment, and the commands to run.
30+
31+
The machine running workflows is called a _runner_.
2032

2133
## Arm-hosted GitHub runners
2234

23-
Arm-hosted GitHub runners are a powerful addition to your CI/CD toolkit. They leverage the efficiency and performance of Arm64 architecture, making your build systems faster and easier to scale. By using the Arm-hosted GitHub runners, you can optimize your workflows, reduce costs, and improve energy consumption. Additionally, the Arm-hosted runners are preloaded with essential tools, making it easier for you to develop and test your applications.
35+
Hosted GitHub runners are provided by GitHub so you don't need to setup and manage cloud infrastructure. Arm-hosted GitHub runners use the Arm architecture so you can build and test software without cross-compiling or instruction emulation.
36+
37+
Arm-hosted GitHub runners enable you to optimize your workflows, reduce cost, and improve energy consumption.
38+
39+
Additionally, the Arm-hosted runners are preloaded with essential tools, making it easier for you to develop and test your applications.
2440

2541
Arm-hosted runners are available for Linux and Windows. This Learning Path uses Linux.
2642

2743
{{% notice Note %}}
2844
You must have a Team or Enterprise Cloud plan to use Arm-hosted runners.
2945
{{% /notice %}}
3046

31-
Getting started with Arm-hosted GitHub runners is straightforward. Follow [these steps to create a Linux Arm-hosted runner within your organization](/learning-paths/cross-platform/github-arm-runners/runner/#how-can-i-create-an-arm-hosted-runner).
47+
Getting started with Arm-hosted GitHub runners is straightforward. Follow the steps in [Create a new Arm-hosted runner](/learning-paths/cross-platform/github-arm-runners/runner/#how-can-i-create-an-arm-hosted-runner) to create a runner in your organization.
3248

33-
Once you have created the runner within your organization, you can use the `runs-on` syntax in your GitHub Actions workflow file to execute the workflow on Arm. Shown here is an example workflow that executes on your Arm-hosted runner named `ubuntu-22.04-arm`:
49+
Once you have created the runner, use the `runs-on` syntax in your GitHub Actions workflow file to execute the workflow on Arm.
50+
51+
Below is an example workflow that executes on an Arm-hosted runner named `ubuntu-22.04-arm-os`:
3452

3553
```yaml
3654
name: Example workflow
@@ -45,14 +63,25 @@ jobs:
4563
run: echo "This line runs on Arm!"
4664
```
4765
48-
This setup allows you to take full advantage of the Arm64 architecture's capabilities. Whether you are working on cloud, edge, or automotive projects, these runners provide a versatile and robust solution.
4966
5067
## Machine Learning Operations (MLOps)
5168
52-
With machine learning use-cases evolving and scaling, comes an increased need for reliable workflows to maintain them. There are many regular tasks that can be automated in the ML lifecycle. Models need to be re-trained, while ensuring they still perform at their best capacity. New training data needs to be properly stored and pre-processed, and the models need to be deployed in a good production environment. Developer Operations (DevOps) refers to good practices for CI/CD. The domain-specific needs for ML, combined with state of the art DevOps knowledge, created the term MLOps.
69+
Machine learning use-cases have a need for reliable workflows to maintain performance and quality.
70+
71+
There are many tasks that can be automated in the ML lifecycle.
72+
- Model training and re-training
73+
- Model performance analysis
74+
- Data storage and processing
75+
- Model deployment
76+
77+
Developer Operations (DevOps) refers to good practices for collaboration and automation, including CI/CD. The domain-specific needs for ML, combined with DevOps knowledge, creates the new term MLOps.
5378
5479
## German Traffic Sign Recognition Benchmark (GTSRB)
5580
56-
In this Learning path, you will train and test a PyTorch model for use in Traffic Sign recognition. You will use the GTSRB dataset to train the model. The dataset is free to use under the [Creative Commons](https://creativecommons.org/publicdomain/zero/1.0/) license. It contains thousands of images of traffic signs found in Germany. Thanks to the availability and real-world connection, it has become a well-known resource to showcase ML applications. Additionally, given that it is a benchmark, you can apply it in a MLOps context to compare model improvements. This makes it a great candidate for this Learning Path, where you compare the performance of your trained model using two different PyTorch backends.
81+
This Learning Path explains how to train and test a PyTorch model to perform traffic sign recognition.
82+
83+
You will learn how to use the GTSRB dataset to train the model. The dataset is free to use under the [Creative Commons](https://creativecommons.org/publicdomain/zero/1.0/) license. It contains thousands of images of traffic signs found in Germany. It has become a well-known resource to showcase ML applications.
84+
85+
The GTSRB dataset is also good for comparing performance and accuracy of different models and to compare and contrast different PyTorch backends.
5786
58-
Now that you have an overview, in the following sections you will learn how to setup an end-to-end MLOps workflow using the Arm-hosted GitHub runners.
87+
Continue to the next section to learn how to setup an end-to-end MLOps workflow using Arm-hosted GitHub runners.

content/learning-paths/servers-and-cloud-computing/gh-runners/compare-performance.md

Lines changed: 30 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,34 @@
11
---
2-
title: Modify test workflow and compare performance
2+
title: Compare the performance of PyTorch backends
33
weight: 5
44

55
### FIXED, DO NOT MODIFY
66
layout: learningpathall
77
---
88

9-
Continuously monitoring the performance of your machine learning models in production is crucial to maintaining their effectiveness over time. The performance of your ML model can change due to various factors ranging from data-related issues to model-specific and environmental factors.
9+
Continuously monitoring the performance of your machine learning models in production is crucial to maintaining effectiveness over time. The performance of your ML model can change due to various factors ranging from data-related issues to environmental factors.
1010

11-
In this section, you will change the PyTorch backend being used to test the trained model. You will learn how to measure and continuously monitor the inference performance with your workflow.
11+
In this section, you will change the PyTorch backend being used to test the trained model. You will learn how to measure and continuously monitor the inference performance using your workflow.
1212

1313
## OneDNN with Arm Compute Library (ACL)
1414

15-
In the previous section, you used the PyTorch 2.3.0 Docker Image compiled with OpenBLAS from DockerHub to run your testing workflow. PyTorch can be run with other backends as well. You will now modify the testing workflow to use PyTorch 2.3.0 Docker Image compiled with OneDNN and the Arm Compute Library.
15+
In the previous section, you used the PyTorch 2.3.0 Docker Image compiled with OpenBLAS from DockerHub to run your testing workflow. PyTorch can be run with other backends. You will now modify the testing workflow to use PyTorch 2.3.0 Docker Image compiled with OneDNN and the Arm Compute Library.
1616

17-
The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors, and the Mali GPUs. The Arm-hosted GitHub runners use Arm Neoverse CPUs, which makes it possible to optimize your neural networks to take advantange of the features available on the runners. ACL implements kernels (which you may know as operators or layers), which uses specific instructions that run faster on AArch64.
18-
ACL is integrated into PyTorch through the [oneDNN engine](https://github.com/oneapi-src/oneDNN).
17+
The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors and Mali GPUs. Arm-hosted GitHub runners use Arm Neoverse CPUs, which make it possible to optimize your neural networks to take advantage of processor features. ACL implements kernels (also known as operators or layers), using specific instructions that run faster on AArch64.
18+
19+
ACL is integrated into PyTorch through [oneDNN](https://github.com/oneapi-src/oneDNN), an open-source deep neural network library.
1920

2021
## Modify the test workflow and compare results
2122

22-
Two different PyTorch docker images for Arm Neoverse CPUs are available on [DockerHub](https://hub.docker.com/r/armswdev/pytorch-arm-neoverse). Up until this point, you used the `r24.07-torch-2.3.0-openblas` container image in your workflows. You will now update `test_model.yml` to use the `r24.07-torch-2.3.0-onednn-acl` container image instead.
23+
Two different PyTorch docker images for Arm Neoverse CPUs are available on [DockerHub](https://hub.docker.com/r/armswdev/pytorch-arm-neoverse).
24+
25+
Up until this point, you used the `r24.07-torch-2.3.0-openblas` container image to run workflows. The oneDNN container image is also available to use in workflows. These images represent two different PyTorch backends which handle the PyTorch model execution.
26+
27+
### Change the Docker image to use oneDNN
2328

24-
Open and edit `.github/workflows/test_model.yml` in your browser. Update the `container.image` parameter to `armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl` and save the file:
29+
In your browser, open and edit the file `.github/workflows/test_model.yml`.
30+
31+
Update the `container.image` parameter to `armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl` and save the file by committing the change to the main branch:
2532

2633
```yaml
2734
jobs:
@@ -34,9 +41,17 @@ jobs:
3441
# Steps omitted
3542
```
3643

37-
Trigger the Test Model job again by clicking the Run workflow button on the Actions tab.
44+
### Run the test workflow
45+
46+
Trigger the **Test Model** job again by clicking the `Run workflow` button on the `Actions` tab.
47+
48+
The test workflow starts running.
3849

39-
Expand the Run testing script step from your Actions tab. You should see a change in the performance results with OneDNN and ACL kernels being used.
50+
Navigate to the workflow run on the `Actions` tab, click into the job, and expand the **Run testing script** step.
51+
52+
You see a change in the performance results with OneDNN and ACL kernels being used.
53+
54+
The output is similar to:
4055

4156
```output
4257
Accuracy of the model on the test images: 90.48%
@@ -55,8 +70,10 @@ Accuracy of the model on the test images: 90.48%
5570
aten::addmm 8.50% 558.000us 8.71% 572.000us 286.000us 2
5671
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
5772
Self CPU time total: 6.565ms
58-
5973
```
60-
For the ACL results, observe that the **Self CPU time total** is lower compared to the OpenBLAS run in the previous section. The names of the layers have changed as well, where the `aten::mkldnn_convolution` is the kernel optimized to run on Aarch64. That operator is the main reason our inference time is improved, made possible by using ACL kernels.
6174

62-
In the next section, you will learn how to automate the deployment of your trained and tested model.
75+
For the ACL results, notice that the **Self CPU time total** is lower compared to the OpenBLAS run in the previous section.
76+
77+
The names of the layers have also changed, where the `aten::mkldnn_convolution` is the kernel optimized to run on the Arm architecture. That operator is the main reason the inference time is improved, made possible by using ACL kernels.
78+
79+
In the next section, you will learn how to automate the deployment of your model.

0 commit comments

Comments
 (0)