Skip to content

Commit 09f4213

Browse files
Merge pull request #1316 from annietllnd/gh-runners
ML Ops with Arm-based GitHub runners Learning Path
2 parents 784cdd0 + 6a7b0ca commit 09f4213

File tree

19 files changed

+858
-0
lines changed

19 files changed

+858
-0
lines changed
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
---
2+
title: MLOps with Arm-hosted GitHub Runners
3+
4+
minutes_to_complete: 30
5+
6+
who_is_this_for: This is an introductory topic for software developers interested in automation for machine learning (ML) tasks.
7+
8+
learning_objectives:
9+
- Set up an Arm-hosted GitHub runner
10+
- Train and test a PyTorch ML model with the German Traffic Sign Recognition Benchmark (GTSRB) dataset on Arm
11+
- Use PyTorch compiled with OpenBLAS and oneDNN with Arm Compute Library to compare the performance of your trained model
12+
- Containerize the model and push your container to DockerHub
13+
- Automate all the steps in the ML workflow using GitHub Actions
14+
15+
16+
prerequisites:
17+
- A GitHub account with access to Arm-hosted GitHub runners
18+
- Some familiarity with ML and continuous integration and deployment (CI/CD) concepts is assumed
19+
20+
author_primary: Pareena Verma, Annie Tallund
21+
22+
### Tags
23+
skilllevels: Introductory
24+
subjects: CI/CD
25+
armips:
26+
- Neoverse
27+
tools_software_languages:
28+
- Python
29+
- PyTorch
30+
- ACL
31+
- GitHub
32+
operatingsystems:
33+
- Linux
34+
35+
36+
### FIXED, DO NOT MODIFY
37+
# ================================================================================
38+
weight: 1 # _index.md always has weight of 1 to order correctly
39+
layout: "learningpathall" # All files under learning paths have this same wrapper
40+
learning_path_main_page: "yes" # This should be surfaced when looking for related content. Only set for _index.md of learning path content.
41+
---
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
---
2+
next_step_guidance: Thank you for completing the learning path on running MLOps with Arm-hosted GitHub runners. You might be interested in learning how to build Arm images and multi-architecture images with these Arm-hosted runners.
3+
4+
recommended_path: /learning-paths/cross-platform/github-arm-runners
5+
6+
further_reading:
7+
- resource:
8+
title: Arm64 on GitHub Actions - Powering faster, more efficient build systems
9+
link: https://github.blog/news-insights/product-news/arm64-on-github-actions-powering-faster-more-efficient-build-systems/
10+
type: blog
11+
- resource:
12+
title: Arm Compute Library
13+
link: https://github.com/ARM-software/ComputeLibrary
14+
type: website
15+
- resource:
16+
title: Streamlining your MLOps pipeline with GitHub Actions and Arm64 runners
17+
link: https://github.blog/enterprise-software/ci-cd/streamlining-your-mlops-pipeline-with-github-actions-and-arm64-runners/
18+
type: blog
19+
20+
21+
# ================================================================================
22+
# FIXED, DO NOT MODIFY
23+
# ================================================================================
24+
weight: 21 # set to always be larger than the content in this path, and one more than 'review'
25+
title: "Next Steps" # Always the same
26+
layout: "learningpathall" # All files under learning paths have this same wrapper
27+
---
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
---
2+
review:
3+
- questions:
4+
question: >
5+
Can Arm-hosted runners be used with GitHub Actions?
6+
answers:
7+
- "Yes"
8+
- "No"
9+
correct_answer: 1
10+
explanation: >
11+
Arm-hosted runners for use with GitHub Actions are available for Linux and Windows.
12+
13+
- questions:
14+
question: >
15+
What is the GTSRB dataset made up of?
16+
answers:
17+
- Sound files of spoken German words
18+
- Sound files of animal sounds
19+
- Images of flower petals
20+
- Images of German traffic signs
21+
correct_answer: 4
22+
explanation: >
23+
GTSRB stands for German Traffic Signs Recognition Benchmark
24+
25+
- questions:
26+
question: >
27+
ACL is integrated into PyTorch by default.
28+
answers:
29+
- "True"
30+
- "False"
31+
correct_answer: 1
32+
explanation: >
33+
While it is possible to use ACL stand-alone, the optimized kernels are built into PyTorch through the oneDNN backend.
34+
35+
36+
37+
# ================================================================================
38+
# FIXED, DO NOT MODIFY
39+
# ================================================================================
40+
title: "Review" # Always the same title
41+
weight: 20 # Set to always be larger than the content in this path
42+
layout: "learningpathall" # All files under learning paths have this same wrapper
43+
---
Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
---
2+
title: Background
3+
weight: 2
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
## Overview
10+
11+
In this Learning Path, you will learn how to automate your MLOps workflow using an Arm-hosted GitHub runner and GitHub Actions. You will learn how to train and test a neural network model with PyTorch. You will compare the model inference time for your trained model using two different PyTorch backends. You will then containerize your trained model and deploy the container image to DockerHub for easy deployment of your application.
12+
13+
## GitHub Actions
14+
15+
GitHub Actions is a platform that automates software development workflows, including continuous integration and continuous delivery. Every repository on GitHub has a tab named _Actions_.
16+
17+
![#actions-gui](images/actions-gui.png)
18+
19+
From here, you can run different _workflow files_ which automate processes that run when specific events occur in your GitHub code repository. You use [YAML](https://yaml.org/) to define a workflow. You specify how a job is triggered, the running environment, and the workflow commands. The machine on which the workflow runs is called a _runner_.
20+
21+
## Arm-hosted GitHub runners
22+
23+
Arm-hosted GitHub runners are a powerful addition to your CI/CD toolkit. They leverage the efficiency and performance of Arm64 architecture, making your build systems faster and easier to scale. By using the Arm-hosted GitHub runners, you can optimize your workflows, reduce costs, and improve energy consumption. Additionally, the Arm-hosted runners are preloaded with essential tools, making it easier for you to develop and test your applications.
24+
25+
Arm-hosted runners are available for Linux and Windows. This Learning Path uses Linux.
26+
27+
{{% notice Note %}}
28+
You must have a Team or Enterprise Cloud plan to use Arm-hosted runners.
29+
{{% /notice %}}
30+
31+
Getting started with Arm-hosted GitHub runners is straightforward. Follow [these steps to create a Linux Arm-hosted runner within your organization](/learning-paths/cross-platform/github-arm-runners/runner/#how-can-i-create-an-arm-hosted-runner).
32+
33+
Once you have created the runner within your organization, you can use the `runs-on` syntax in your GitHub Actions workflow file to execute the workflow on Arm. Shown here is an example workflow that executes on your Arm-hosted runner named `ubuntu-22.04-arm`:
34+
35+
```yaml
36+
name: Example workflow
37+
on:
38+
workflow_dispatch:
39+
jobs:
40+
example-job:
41+
name: Example Job
42+
runs-on: ubuntu-22.04-arm-os # Custom ARM64 runner
43+
steps:
44+
- name: Example step
45+
run: echo "This line runs on Arm!"
46+
```
47+
48+
This setup allows you to take full advantage of the Arm64 architecture's capabilities. Whether you are working on cloud, edge, or automotive projects, these runners provide a versatile and robust solution.
49+
50+
## Machine Learning Operations (MLOps)
51+
52+
With machine learning use-cases evolving and scaling, comes an increased need for reliable workflows to maintain them. There are many regular tasks that can be automated in the ML lifecycle. Models need to be re-trained, while ensuring they still perform at their best capacity. New training data needs to be properly stored and pre-processed, and the models need to be deployed in a good production environment. Developer Operations (DevOps) refers to good practices for CI/CD. The domain-specific needs for ML, combined with state of the art DevOps knowledge, created the term MLOps.
53+
54+
## German Traffic Sign Recognition Benchmark (GTSRB)
55+
56+
In this Learning path, you will train and test a PyTorch model for use in Traffic Sign recognition. You will use the GTSRB dataset to train the model. The dataset is free to use under the [Creative Commons](https://creativecommons.org/publicdomain/zero/1.0/) license. It contains thousands of images of traffic signs found in Germany. Thanks to the availability and real-world connection, it has become a well-known resource to showcase ML applications. Additionally, given that it is a benchmark, you can apply it in a MLOps context to compare model improvements. This makes it a great candidate for this Learning Path, where you compare the performance of your trained model using two different PyTorch backends.
57+
58+
Now that you have an overview, in the following sections you will learn how to setup an end-to-end MLOps workflow using the Arm-hosted GitHub runners.
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
---
2+
title: Modify test workflow and compare performance
3+
weight: 5
4+
5+
### FIXED, DO NOT MODIFY
6+
layout: learningpathall
7+
---
8+
9+
Continuously monitoring the performance of your machine learning models in production is crucial to maintaining their effectiveness over time. The performance of your ML model can change due to various factors ranging from data-related issues to model-specific and environmental factors.
10+
11+
In this section, you will change the PyTorch backend being used to test the trained model. You will learn how to measure and continuously monitor the inference performance with your workflow.
12+
13+
## OneDNN with Arm Compute Library (ACL)
14+
15+
In the previous section, you used the PyTorch 2.3.0 Docker Image compiled with OpenBLAS from DockerHub to run your testing workflow. PyTorch can be run with other backends as well. You will now modify the testing workflow to use PyTorch 2.3.0 Docker Image compiled with OneDNN and the Arm Compute Library.
16+
17+
The [Arm Compute Library](https://github.com/ARM-software/ComputeLibrary) is a collection of low-level machine learning functions optimized for Arm's Cortex-A and Neoverse processors, and the Mali GPUs. The Arm-hosted GitHub runners use Arm Neoverse CPUs, which makes it possible to optimize your neural networks to take advantange of the features available on the runners. ACL implements kernels (which you may know as operators or layers), which uses specific instructions that run faster on AArch64.
18+
ACL is integrated into PyTorch through the [oneDNN engine](https://github.com/oneapi-src/oneDNN).
19+
20+
## Modify the test workflow and compare results
21+
22+
Two different PyTorch docker images for Arm Neoverse CPUs are available on [DockerHub](https://hub.docker.com/r/armswdev/pytorch-arm-neoverse). Up until this point, you used the `r24.07-torch-2.3.0-openblas` container image in your workflows. You will now update `test_model.yml` to use the `r24.07-torch-2.3.0-onednn-acl` container image instead.
23+
24+
Open and edit `.github/workflows/test_model.yml` in your browser. Update the `container.image` parameter to `armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl` and save the file:
25+
26+
```yaml
27+
jobs:
28+
test-model:
29+
name: Test the Model
30+
runs-on: ubuntu-22.04-arm-os # Custom ARM64 runner
31+
container:
32+
image: armswdev/pytorch-arm-neoverse:r24.07-torch-2.3.0-onednn-acl
33+
options: --user root
34+
# Steps omitted
35+
```
36+
37+
Trigger the Test Model job again by clicking the Run workflow button on the Actions tab.
38+
39+
Expand the Run testing script step from your Actions tab. You should see a change in the performance results with OneDNN and ACL kernels being used.
40+
41+
```output
42+
Accuracy of the model on the test images: 90.48%
43+
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
44+
Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls
45+
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
46+
model_inference 4.63% 304.000us 100.00% 6.565ms 6.565ms 1
47+
aten::conv2d 0.18% 12.000us 56.92% 3.737ms 1.869ms 2
48+
aten::convolution 0.30% 20.000us 56.74% 3.725ms 1.863ms 2
49+
aten::_convolution 0.43% 28.000us 56.44% 3.705ms 1.853ms 2
50+
aten::mkldnn_convolution 47.02% 3.087ms 55.48% 3.642ms 1.821ms 2
51+
aten::max_pool2d 0.15% 10.000us 25.51% 1.675ms 837.500us 2
52+
aten::max_pool2d_with_indices 25.36% 1.665ms 25.36% 1.665ms 832.500us 2
53+
aten::linear 0.18% 12.000us 9.26% 608.000us 304.000us 2
54+
aten::clone 0.26% 17.000us 9.08% 596.000us 149.000us 4
55+
aten::addmm 8.50% 558.000us 8.71% 572.000us 286.000us 2
56+
--------------------------------- ------------ ------------ ------------ ------------ ------------ ------------
57+
Self CPU time total: 6.565ms
58+
59+
```
60+
For the ACL results, observe that the **Self CPU time total** is lower compared to the OpenBLAS run in the previous section. The names of the layers have changed as well, where the `aten::mkldnn_convolution` is the kernel optimized to run on Aarch64. That operator is the main reason our inference time is improved, made possible by using ACL kernels.
61+
62+
In the next section, you will learn how to automate the deployment of your trained and tested model.

0 commit comments

Comments
 (0)