Skip to content

Commit e3b3e98

Browse files
authored
Merge pull request #211634 from sdgilley/sdg-migration
Add migration docs
2 parents 4cf3e6b + ad22466 commit e3b3e98

14 files changed

+2081
-20
lines changed

articles/machine-learning/how-to-migrate-from-v1.md

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: core
88
ms.topic: how-to
99
author: s-polly
1010
ms.author: scottpolly
11-
ms.date: 06/01/2022
11+
ms.date: 09/23/2022
1212
ms.reviewer: blackmist
1313
ms.custom: devx-track-azurecli, devplatv2
1414
---
@@ -84,11 +84,14 @@ Do consider migrating the code for creating a workspace to v2. Typically Azure r
8484
> [!IMPORTANT]
8585
> If your workspace uses a private endpoint, it will automatically have the `v1_legacy_mode` flag enabled, preventing usage of v2 APIs. See [how to configure network isolation with v2](how-to-configure-network-isolation-with-v2.md) for details.
8686
87+
8788
### Connection (workspace connection in v1)
8889

8990
Workspace connections from v1 are persisted on the workspace, and fully available with v2.
9091

9192
We recommend migrating the code for creating connections to v2.
93+
For a comparison of SDK v1 and v2 code, see [Migrate workspace management from SDK v1 to SDK v2](migrate-to-v2-resource-workspace.md).
94+
9295

9396
### Datastore
9497

@@ -114,6 +117,8 @@ You can continue using your existing v1 model deployments. For new model deploym
114117
|Azure Kubernetes Service (AKS)|ACI, AKS|Manage your own AKS cluster(s) for model deployment, giving flexibility and granular control at the cost of IT overhead.|
115118
|Azure Arc Kubernetes|N/A|Manage your own Kubernetes cluster(s) in other clouds or on-premises, giving flexibility and granular control at the cost of IT overhead.|
116119

120+
For a comparison of SDK v1 and v2 code, see [Migrate deployment endpoints from SDK v1 to SDK v2](migrate-to-v2-deploy-endpoints.md).
121+
117122
### Jobs (experiments, runs, pipelines in v1)
118123

119124
In v2, "experiments", "runs", and "pipelines" are consolidated into jobs. A job has a type. Most jobs are `command` jobs that run a command, like `python main.py`. What runs in a job is agnostic to any programming language, so you can run `bash` scripts, invoke `python` interpreters, run a bunch of `curl` commands, or anything else. Another common type of job is `pipeline`, which defines child jobs that may have input/output relationships, forming a directed acyclic graph (DAG).
@@ -126,6 +131,8 @@ What you run *within* the job does not need to be migrated to v2. However, it is
126131

127132
We recommend migrating the code for creating jobs to v2. You can see [how to train models with the CLI (v2)](how-to-train-cli.md) and the [job YAML references](reference-yaml-job-command.md) for authoring jobs in v2 YAMLs.
128133

134+
For a comparison of SDK v1 and v2 code, see [Migrate script run from SDK v1 to SDK v2](migrate-to-v2-command-job.md).
135+
129136
### Data (datasets in v1)
130137

131138
Datasets are renamed to data assets. Interoperability between v1 datasets and v2 data assets is the most complex of any entity in Azure ML.
@@ -136,18 +143,34 @@ For details on data in v2, see the [data concept article](concept-data.md).
136143

137144
We recommend migrating the code for [creating data assets](how-to-create-data-assets.md) to v2.
138145

146+
For a comparison of SDK v1 and v2 code, see [Migrate data management from SDK v1 to v2](migrate-to-v2-assets-data.md).
147+
148+
139149
### Model
140150

141151
Models created from v1 can be used in v2. In v2, explicit model types are introduced. Similar to data assets, it may be easier to re-create a v1 model as a v2 model, setting the type appropriately.
142152

143153
We recommend migrating the code for creating models with [SDK](how-to-train-sdk.md) or [CLI](how-to-train-cli.md) to v2.
144154

155+
For a comparison of SDK v1 and v2 code, see
156+
157+
* [Migrate model management from SDK v1 to SDK v2](migrate-to-v2-assets-model.md)
158+
* [Migrate AutoML from SDK v1 to SDK v2](migrate-to-v2-execution-automl.md)
159+
* [Migrate hyperparameter tuning from SDK v1 to SDK v2](migrate-to-v2-execution-hyperdrive.md)
160+
* [Migrate parallel run step from SDK v1 to SDK v2](migrate-to-v2-execution-parallel-run-step.md)
161+
145162
### Environment
146163

147164
Environments created from v1 can be used in v2. In v2, environments have new features like creation from a local Docker context.
148165

149166
We recommend migrating the code for creating environments to v2.
150167

168+
## Managing secrets
169+
170+
The management of Key Vault secrets differs significantly in V2 compared to V1. The V1 set_secret and get_secret SDK methods are not available in V2. Instead, direct access using Key Vault client libraries should be used.
171+
172+
For details about Key Vault, see [Use authentication credential secrets in Azure Machine Learning training jobs](how-to-use-secrets-in-runs.md).
173+
151174
## Scenarios across the machine learning lifecycle
152175

153176
There are a few scenarios that are common across the machine learning lifecycle using Azure ML. We'll look at a few and give general recommendations for migrating to v2.
@@ -182,7 +205,7 @@ A MLOps workflow typically involves CI/CD through an external tool. It's recomme
182205

183206
The solution accelerator for MLOps with v2 is being developed at https://github.com/Azure/mlops-v2 and can be used as reference or adopted for setup and automation of the machine learning lifecycle.
184207

185-
#### A note on GitOps with v2
208+
### A note on GitOps with v2
186209

187210
A key paradigm with v2 is serializing machine learning entities as YAML files for source control with `git`, enabling better GitOps approaches than were possible with v1. For instance, you could enforce policy by which only a service principal used in CI/CD pipelines can create/update/delete some or all entities, ensuring changes go through a governed process like pull requests with required reviewers. Since the files in source control are YAML, they're easy to diff and track changes over time. You and your team may consider shifting to this paradigm as you migrate to v2.
188211

Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
---
2+
title: 'Migrate data management from SDK v1 to v2'
3+
titleSuffix: Azure Machine Learning
4+
description: Migrate data management from v1 to v2 of Azure Machine Learning SDK
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: mldata
8+
ms.topic: reference
9+
author: SturgeonMi
10+
ms.author: xunwan
11+
ms.date: 09/16/2022
12+
ms.reviewer: sgilley
13+
ms.custom: migration
14+
---
15+
16+
# Migrate data management from SDK v1 to v2
17+
18+
In V1, an AzureML dataset can either be a `Filedataset` or a `Tabulardataset`.
19+
In V2, an AzureML data asset can be a `uri_folder`, `uri_file` or `mltable`.
20+
You can conceptually map `Filedataset` to `uri_folder` and `uri_file`, `Tabulardataset` to `mltable`.
21+
22+
* URIs (`uri_folder`, `uri_file`) - a Uniform Resource Identifier that is a reference to a storage location on your local computer or in the cloud that makes it easy to access data in your jobs.
23+
* MLTable - a method to abstract the schema definition for tabular data so that it's easier for consumers of the data to materialize the table into a Pandas/Dask/Spark dataframe.
24+
25+
This article gives a comparison of data scenario(s) in SDK v1 and SDK v2.
26+
27+
## Create a `filedataset`/ uri type of data asset
28+
29+
* SDK v1 - Create a `Filedataset`
30+
31+
```python
32+
from azureml.core import Workspace, Datastore, Dataset
33+
34+
# create a FileDataset pointing to files in 'animals' folder and its subfolders recursively
35+
datastore_paths = [(datastore, 'animals')]
36+
animal_ds = Dataset.File.from_files(path=datastore_paths)
37+
38+
# create a FileDataset from image and label files behind public web urls
39+
web_paths = ['https://azureopendatastorage.blob.core.windows.net/mnist/train-images-idx3-ubyte.gz',
40+
'https://azureopendatastorage.blob.core.windows.net/mnist/train-labels-idx1-ubyte.gz']
41+
mnist_ds = Dataset.File.from_files(path=web_paths)
42+
```
43+
44+
* SDK v2
45+
* Create a `URI_FOLDER` type data asset
46+
47+
```python
48+
from azure.ai.ml.entities import Data
49+
from azure.ai.ml.constants import AssetTypes
50+
51+
# Supported paths include:
52+
# local: './<path>'
53+
# blob: 'https://<account_name>.blob.core.windows.net/<container_name>/<path>'
54+
# ADLS gen2: 'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/'
55+
# Datastore: 'azureml://datastores/<data_store_name>/paths/<path>'
56+
57+
my_path = '<path>'
58+
59+
my_data = Data(
60+
path=my_path,
61+
type=AssetTypes.URI_FOLDER,
62+
description="<description>",
63+
name="<name>",
64+
version='<version>'
65+
)
66+
67+
ml_client.data.create_or_update(my_data)
68+
```
69+
70+
* Create a `URI_FILE` type data asset.
71+
```python
72+
from azure.ai.ml.entities import Data
73+
from azure.ai.ml.constants import AssetTypes
74+
75+
# Supported paths include:
76+
# local: './<path>/<file>'
77+
# blob: 'https://<account_name>.blob.core.windows.net/<container_name>/<path>/<file>'
78+
# ADLS gen2: 'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/<file>'
79+
# Datastore: 'azureml://datastores/<data_store_name>/paths/<path>/<file>'
80+
my_path = '<path>'
81+
82+
my_data = Data(
83+
path=my_path,
84+
type=AssetTypes.URI_FILE,
85+
description="<description>",
86+
name="<name>",
87+
version="<version>"
88+
)
89+
90+
ml_client.data.create_or_update(my_data)
91+
```
92+
93+
## Create a tabular dataset/data asset
94+
95+
* SDK v1
96+
97+
```python
98+
from azureml.core import Workspace, Datastore, Dataset
99+
100+
datastore_name = 'your datastore name'
101+
102+
# get existing workspace
103+
workspace = Workspace.from_config()
104+
105+
# retrieve an existing datastore in the workspace by name
106+
datastore = Datastore.get(workspace, datastore_name)
107+
108+
# create a TabularDataset from 3 file paths in datastore
109+
datastore_paths = [(datastore, 'weather/2018/11.csv'),
110+
(datastore, 'weather/2018/12.csv'),
111+
(datastore, 'weather/2019/*.csv')]
112+
113+
weather_ds = Dataset.Tabular.from_delimited_files(path=datastore_paths)
114+
```
115+
116+
* SDK v2 - Create `mltable` data asset via yaml definition
117+
118+
```yaml
119+
type: mltable
120+
121+
paths:
122+
- pattern: ./*.txt
123+
transformations:
124+
- read_delimited:
125+
delimiter: ,
126+
encoding: ascii
127+
header: all_files_same_headers
128+
```
129+
130+
```python
131+
from azure.ai.ml.entities import Data
132+
from azure.ai.ml.constants import AssetTypes
133+
134+
# my_path must point to folder containing MLTable artifact (MLTable file + data
135+
# Supported paths include:
136+
# local: './<path>'
137+
# blob: 'https://<account_name>.blob.core.windows.net/<container_name>/<path>'
138+
# ADLS gen2: 'abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>/'
139+
# Datastore: 'azureml://datastores/<data_store_name>/paths/<path>'
140+
141+
my_path = '<path>'
142+
143+
my_data = Data(
144+
path=my_path,
145+
type=AssetTypes.MLTABLE,
146+
description="<description>",
147+
name="<name>",
148+
version='<version>'
149+
)
150+
151+
ml_client.data.create_or_update(my_data)
152+
```
153+
154+
## Use data in an experiment/job
155+
156+
* SDK v1
157+
158+
```python
159+
from azureml.core import ScriptRunConfig
160+
161+
src = ScriptRunConfig(source_directory=script_folder,
162+
script='train_titanic.py',
163+
# pass dataset as an input with friendly name 'titanic'
164+
arguments=['--input-data', titanic_ds.as_named_input('titanic')],
165+
compute_target=compute_target,
166+
environment=myenv)
167+
168+
# Submit the run configuration for your training run
169+
run = experiment.submit(src)
170+
run.wait_for_completion(show_output=True)
171+
```
172+
173+
* SDK v2
174+
175+
```python
176+
from azure.ai.ml import command
177+
from azure.ai.ml.entities import Data
178+
from azure.ai.ml import Input, Output
179+
from azure.ai.ml.constants import AssetTypes
180+
181+
# Possible Asset Types for Data:
182+
# AssetTypes.URI_FILE
183+
# AssetTypes.URI_FOLDER
184+
# AssetTypes.MLTABLE
185+
186+
# Possible Paths for Data:
187+
# Blob: https://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>
188+
# Datastore: azureml://datastores/paths/<folder>/<file>
189+
# Data Asset: azureml:<my_data>:<version>
190+
191+
my_job_inputs = {
192+
"raw_data": Input(type=AssetTypes.URI_FOLDER, path="<path>")
193+
}
194+
195+
my_job_outputs = {
196+
"prep_data": Output(type=AssetTypes.URI_FOLDER, path="<path>")
197+
}
198+
199+
job = command(
200+
code="./src", # local path where the code is stored
201+
command="python process_data.py --raw_data ${{inputs.raw_data}} --prep_data ${{outputs.prep_data}}",
202+
inputs=my_job_inputs,
203+
outputs=my_job_outputs,
204+
environment="<environment_name>:<version>",
205+
compute="cpu-cluster",
206+
)
207+
208+
# submit the command
209+
returned_job = ml_client.create_or_update(job)
210+
# get a URL for the status of the job
211+
returned_job.services["Studio"].endpoint
212+
```
213+
214+
## Mapping of key functionality in SDK v1 and SDK v2
215+
216+
|Functionality in SDK v1|Rough mapping in SDK v2|
217+
|-|-|
218+
|[Method/API in SDK v1](/python/api/azurzeml-core/azureml.datadisplayname: migration, v1, v2)|[Method/API in SDK v2](/python/api/azure-ai-ml/azure.ai.ml.entities)|
219+
220+
## Next steps
221+
222+
For more information, see the documentation here:
223+
* [Data in Azure Machine Learning](concept-data.md?tabs=uri-file-example%2Ccli-data-create-example)
224+
* [Create data_assets](how-to-create-data-assets.md?tabs=CLI)
225+
* [Read and write data in a job](how-to-read-write-data-v2.md)
226+
* [V2 datastore operations](/python/api/azure-ai-ml/azure.ai.ml.operations.datastoreoperations)

0 commit comments

Comments
 (0)