Skip to content

Commit 8187de5

Browse files
authored
Initialize AIML-Developer-40/Feature-Branch
0 parents  commit 8187de5

26 files changed

+18540
-0
lines changed

.flake8

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
[flake8]
2+
max-line-length = 100
3+
4+
# Only include these files for linting
5+
filename =
6+
.github/workflows/vertex-ai-cicd.yml
7+
src/compiler.py
8+
src/vertex_pipeline_dev.py
9+
src/vertex_pipeline_prod.py
10+
11+
# Ignore all other files and directories
12+
exclude =
13+
.tox,
14+
.git,
15+
__pycache__,
16+
*.pyc,
17+
*.egg-info,
18+
.cache,
19+
.eggs,
20+
develop,
21+
src/model/v1train.py
22+
23+
# Per-file ignores (if needed)
24+
per-file-ignores =
25+
src/__init__.py:D104
26+
src/*/__init__.py:D104
27+
28+
ignore =
29+
W504,
30+
C901,
31+
E41,
32+
E722,
33+
W,
34+
D,
35+
F,
36+
N,
37+
C,
38+
I
39+
40+
max-complexity = 10
41+
import-order-style = pep8
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
**GitHub Actions**

.gitignore

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
# Byte-compiled / optimized / DLL files
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# C extensions
7+
*.so
8+
9+
# Distribution / packaging
10+
.Python
11+
build/
12+
develop-eggs/
13+
dist/
14+
downloads/
15+
eggs/
16+
.eggs/
17+
lib/
18+
lib64/
19+
parts/
20+
sdist/
21+
var/
22+
wheels/
23+
pip-wheel-metadata/
24+
share/python-wheels/
25+
*.egg-info/
26+
.installed.cfg
27+
*.egg
28+
MANIFEST
29+
30+
# PyInstaller
31+
# Usually these files are written by a python script from a template
32+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
33+
*.manifest
34+
*.spec
35+
36+
# Installer logs
37+
pip-log.txt
38+
pip-delete-this-directory.txt
39+
40+
# Unit test / coverage reports
41+
htmlcov/
42+
.tox/
43+
.nox/
44+
.coverage
45+
.coverage.*
46+
.cache
47+
nosetests.xml
48+
coverage.xml
49+
*.cover
50+
*.py,cover
51+
.hypothesis/
52+
.pytest_cache/
53+
54+
# Translations
55+
*.mo
56+
*.pot
57+
58+
# Django stuff:
59+
*.log
60+
local_settings.py
61+
db.sqlite3
62+
db.sqlite3-journal
63+
64+
# Flask stuff:
65+
instance/
66+
.webassets-cache
67+
68+
# Scrapy stuff:
69+
.scrapy
70+
71+
# Sphinx documentation
72+
docs/_build/
73+
74+
# PyBuilder
75+
target/
76+
77+
# Jupyter Notebook
78+
.ipynb_checkpoints
79+
80+
# IPython
81+
profile_default/
82+
ipython_config.py
83+
84+
# pyenv
85+
.python-version
86+
87+
# pipenv
88+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
89+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
90+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
91+
# install all needed dependencies.
92+
#Pipfile.lock
93+
94+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
95+
__pypackages__/
96+
97+
# Celery stuff
98+
celerybeat-schedule
99+
celerybeat.pid
100+
101+
# SageMath parsed files
102+
*.sage.py
103+
104+
# Environments
105+
.env
106+
.venv
107+
env/
108+
venv/
109+
ENV/
110+
env.bak/
111+
venv.bak/
112+
113+
# Spyder project settings
114+
.spyderproject
115+
.spyproject
116+
117+
# Rope project settings
118+
.ropeproject
119+
120+
# mkdocs documentation
121+
/site
122+
123+
# mypy
124+
.mypy_cache/
125+
.dmypy.json
126+
dmypy.json
127+
128+
# Pyre type checker
129+
.pyre/
130+
logs/

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
# ⚠️ INDEPENDENT FORK - DO NOT SYNC WITH UPSTREAM ⚠️
2+
**This fork is intentionally kept separate from the original repository.**

_build.yml

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
name: '$(Date:yyyyMMdd)$(Rev:.rr)'
2+
jobs:
3+
- job: build_markdown_content
4+
displayName: 'Build Markdown Content'
5+
workspace:
6+
clean: all
7+
pool:
8+
vmImage: 'Ubuntu 16.04'
9+
container:
10+
image: 'microsoftlearning/markdown-build:latest'
11+
steps:
12+
- task: Bash@3
13+
displayName: 'Build Content'
14+
inputs:
15+
targetType: inline
16+
script: |
17+
cp /{attribution.md,template.docx,package.json,package.js} .
18+
npm install
19+
node package.js --version $(Build.BuildNumber)
20+
- task: GitHubRelease@0
21+
displayName: 'Create GitHub Release'
22+
inputs:
23+
gitHubConnection: 'github-microsoftlearning-organization'
24+
repositoryName: '$(Build.Repository.Name)'
25+
tagSource: manual
26+
tag: 'v$(Build.BuildNumber)'
27+
title: 'Version $(Build.BuildNumber)'
28+
releaseNotesSource: input
29+
releaseNotes: '# Version $(Build.BuildNumber) Release'
30+
assets: '$(Build.SourcesDirectory)/out/*.zip'
31+
assetUploadMode: replace
32+
- task: PublishBuildArtifacts@1
33+
displayName: 'Publish Output Files'
34+
inputs:
35+
pathtoPublish: '$(Build.SourcesDirectory)/out/'
36+
artifactName: 'Lab Files'

_config.yml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
remote_theme: MicrosoftLearning/Jekyll-Theme
2+
exclude:
3+
- readme.md
4+
- .github/
5+
header_pages:
6+
- index.html
7+
author: Microsoft Learning
8+
twitter_username: mslearning
9+
github_username: MicrosoftLearning
10+
plugins:
11+
- jekyll-sitemap
12+
- jekyll-mentions
13+
- jemoji
14+
markdown: kramdown
15+
kramdown:
16+
syntax_highlighter_opts:
17+
disable : true

awscliv2.zip

67.3 MB
Binary file not shown.

documentation/documentation.md

Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
---
2+
challenge:
3+
module: Convert a notebook to production code
4+
challenge: '0: Convert a notebook to production code'
5+
---
6+
7+
<style>
8+
.button {
9+
border: none;
10+
color: white;
11+
padding: 12px 28px;
12+
background-color: #4285F4;
13+
float: right;
14+
}
15+
</style>
16+
17+
# Challenge 0: Convert a notebook to production code
18+
19+
<button class="button" onclick="window.location.href='https://cloud.google.com/vertex-ai/docs/start/introduction-unified-platform';">Back to overview</button>
20+
21+
## Challenge scenario
22+
23+
The first step to automate machine learning workflows is to convert a Jupyter notebook to production-ready code. When you store your code as scripts, it's easier to automate the code execution. You can parameterize scripts to easily reuse the code for retraining with Google Cloud Vertex AI.
24+
25+
## Prerequisites
26+
27+
To complete this challenge, you'll need:
28+
29+
- Access to a Google Cloud Platform (GCP) account with appropriate permissions.
30+
- A GitHub account.
31+
- Basic familiarity with Vertex AI and Google Cloud services.
32+
- Google Cloud SDK (gcloud) installed and configured.
33+
34+
## Objectives
35+
36+
By completing this challenge, you'll learn how to:
37+
38+
- Clean nonessential code.
39+
- Convert your code to Python scripts compatible with Vertex AI.
40+
- Use functions in your scripts.
41+
- Use parameters in your scripts.
42+
- Implement experiment tracking with Vertex AI Experiments.
43+
44+
> **Important!**
45+
> Each challenge is designed to allow you to explore how to implement DevOps principles when working with machine learning models on Google Cloud Platform. Some instructions may be intentionally vague, inviting you to think about your own preferred approach. If for example, the instructions ask you to create a Vertex AI Workbench instance or enable Vertex AI APIs, it's up to you to explore and decide how you want to create it. To make it the best learning experience for you, it's up to you to make it as simple or as challenging as you want.
46+
47+
## Challenge Duration
48+
49+
- **Estimated Time**: 30 minutes
50+
51+
## Instructions
52+
53+
To work through the challenges, you need **your own public repo** which includes the challenge files. Create a new public repo by navigating to [https://github.com/GoogleCloudPlatform/vertex-ai-samples](https://github.com/GoogleCloudPlatform/vertex-ai-samples) and fork or use as a template to create your own repo.
54+
55+
In the **experimentation** folder, you'll find a Jupyter notebook that trains a classification model. The data used by the notebook is in the **experimentation/data** folder and contains a CSV file.
56+
57+
In the **src/model** folder you'll find a `train.py` script which already includes code converted from part of the notebook. It's up to you to complete it for Vertex AI compatibility.
58+
59+
- Go through the notebook to understand what the code does.
60+
- Convert the code under the **Split data** header and include it in the `train.py` script as a `split_data` function. Remember to:
61+
- Remove nonessential code.
62+
- Include the necessary code as a function.
63+
- Include any necessary libraries at the top of the script.
64+
- Ensure compatibility with Vertex AI training environment.
65+
66+
<details>
67+
<summary>Hint</summary>
68+
<br/>
69+
The <code>split_data</code> function is already included in the main function. You only need to add the function itself with the required inputs and outputs underneath the comment <code>TO DO: add function to split data</code>. Make sure to handle Vertex AI's expected input/output paths using environment variables like <code>AIP_MODEL_DIR</code> and <code>AIP_TRAINING_DATA_URI</code>.
70+
</details>
71+
72+
- Add experiment tracking so that every time you run the script, all parameters and metrics are tracked. Use Vertex AI Experiments to track your training runs, or alternatively, integrate with Vertex AI's managed MLflow to ensure the necessary model files are stored with the job run for easy deployment.
73+
74+
<details>
75+
<summary>Hint</summary>
76+
<br/>
77+
Vertex AI provides native experiment tracking capabilities through Vertex AI Experiments. You can also use the managed MLflow service on Vertex AI for experiment tracking. For Vertex AI Experiments, use the Vertex AI SDK to create and track experiments. For MLflow integration, you can use <code>mlflow.autolog()</code> with Vertex AI's managed MLflow tracking server. Enable experiment tracking in the main function under <code>TO DO: enable experiment tracking</code>.
78+
</details>
79+
80+
- Ensure your script is compatible with Vertex AI custom training jobs by:
81+
- Using Vertex AI environment variables for input/output paths
82+
- Saving the model to the correct output directory (using `AIP_MODEL_DIR`)
83+
- Adding proper argument parsing for hyperparameters
84+
- Integrating with Google Cloud Storage for data access
85+
86+
<details>
87+
<summary>Hint</summary>
88+
<br/>
89+
Vertex AI provides specific environment variables like <code>AIP_MODEL_DIR</code>, <code>AIP_TRAINING_DATA_URI</code>, and <code>AIP_VALIDATION_DATA_URI</code>. Use these to make your script portable across different Vertex AI environments. Also, implement argument parsing using <code>argparse</code> to handle hyperparameters passed from Vertex AI training jobs. Use the Google Cloud Storage client library to read training data from GCS buckets.
90+
</details>
91+
92+
- Integrate with Google Cloud services:
93+
- Use Google Cloud Storage for data storage and model artifacts
94+
- Implement proper logging with Google Cloud Logging
95+
- Consider using Vertex AI Pipelines for workflow orchestration
96+
97+
<details>
98+
<summary>Hint</summary>
99+
<br/>
100+
Import the necessary Google Cloud libraries: <code>google-cloud-storage</code> for GCS operations, <code>google-cloud-logging</code> for structured logging, and <code>google-cloud-aiplatform</code> for Vertex AI integration. Set up proper authentication using Application Default Credentials (ADC) or service account keys.
101+
</details>
102+
103+
## Success criteria
104+
105+
To complete this challenge successfully, you should be able to show:
106+
107+
- A training script which includes a function to split the data and experiment tracking using Vertex AI Experiments or managed MLflow.
108+
- The script is compatible with Vertex AI custom training jobs (uses appropriate environment variables and paths).
109+
- Proper model serialization and saving to Google Cloud Storage.
110+
- Integration with Google Cloud services for logging and data management.
111+
112+
> **Note:**
113+
> If you've used a Vertex AI Workbench instance or Colab Enterprise for experimentation, remember to stop the instance when you're done to avoid unnecessary charges. Also, clean up any Google Cloud Storage buckets and Vertex AI resources you created during testing.
114+
115+
## Useful resources
116+
117+
- [Vertex AI Custom Training Documentation](https://cloud.google.com/vertex-ai/docs/training/custom-training)
118+
- [Vertex AI Experiments for Experiment Tracking](https://cloud.google.com/vertex-ai/docs/experiments/intro-vertex-ai-experiments)
119+
- [Using MLflow with Vertex AI](https://cloud.google.com/vertex-ai/docs/experiments/vertex-ai-mlflow)
120+
- [Vertex AI Workbench User Guide](https://cloud.google.com/vertex-ai/docs/workbench)
121+
- [Google Cloud Storage Client Libraries](https://cloud.google.com/storage/docs/reference/libraries)
122+
- [Vertex AI Python SDK Documentation](https://cloud.google.com/python/docs/reference/aiplatform/latest)
123+
- [Vertex AI Pipelines](https://cloud.google.com/vertex-ai/docs/pipelines/introduction)
124+
- [Google Cloud ML Engineering Best Practices](https://cloud.google.com/architecture/ml-on-gcp-best-practices)
125+
126+
<button class="button" onclick="window.location.href='01-vertex-ai-job';">Continue with challenge 1</button>

0 commit comments

Comments
 (0)