Skip to content

Commit 0be037f

Browse files
authored
Merge pull request #224762 from fbsolo-ms1/tutorial-for-SK
Sam Kemp requested specific changes to specific, designated files.
2 parents 0510204 + 27efec3 commit 0be037f

File tree

6 files changed

+532
-735
lines changed

6 files changed

+532
-735
lines changed

articles/machine-learning/concept-data.md

Lines changed: 60 additions & 301 deletions
Large diffs are not rendered by default.

articles/machine-learning/how-to-create-data-assets.md

Lines changed: 94 additions & 177 deletions
Large diffs are not rendered by default.

articles/machine-learning/how-to-mltable.md

Lines changed: 146 additions & 87 deletions
Large diffs are not rendered by default.

articles/machine-learning/how-to-read-write-data-v2.md

Lines changed: 43 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@ ms.subservice: mldata
88
ms.topic: how-to
99
ms.author: yogipandey
1010
author: ynpandey
11-
ms.reviewer: ssalgado
12-
ms.date: 05/26/2022
11+
ms.reviewer: franksolomon
12+
ms.date: 01/23/2023
1313
ms.custom: devx-track-python, devplatv2, sdkv2, cliv2, event-tier1-build-2022, ignite-2022
1414
#Customer intent: As an experienced Python developer, I need to read in my data to make it available to a remote compute to train my machine learning models.
1515
---
@@ -18,12 +18,12 @@ ms.custom: devx-track-python, devplatv2, sdkv2, cliv2, event-tier1-build-2022, i
1818

1919
[!INCLUDE [dev v2](../../includes/machine-learning-dev-v2.md)]
2020

21-
> [!div class="op_single_selector" title1="Select the version of Azure Machine Learning CLI extension you are using:"]
21+
> [!div class="op_single_selector" title1="Select the version of Azure Machine Learning CLI extension you use:"]
2222
> * [v1](v1/how-to-train-with-datasets.md)
2323
> * [v2 (current version)](how-to-read-write-data-v2.md)
2424
25-
Learn how to read and write data for your jobs with the Azure Machine Learning Python SDK v2 and the Azure Machine Learning CLI extension v2.
26-
25+
Learn how to read and write data for your jobs with the Azure Machine Learning Python SDK v2 and the Azure Machine Learning CLI extension v2.
26+
2727
## Prerequisites
2828

2929
- An Azure subscription. If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/).
@@ -34,48 +34,48 @@ Learn how to read and write data for your jobs with the Azure Machine Learning P
3434

3535
## Supported paths
3636

37-
When you provide a data input/output to a Job, you'll need to specify a `path` parameter that points to the data location. Below is a table that shows the different data locations supported in Azure Machine Learning and examples for the `path` parameter:
37+
When you provide a data input/output to a Job, you must specify a `path` parameter that points to the data location. This table shows both the different data locations that Azure Machine Learning supports, and examples for the `path` parameter:
3838

3939

40-
|Location | Examples | Notes|
41-
|---------|---------|---------|
42-
|A path on your local computer | `./home/username/data/my_data` ||
43-
|A path on a public http(s) server | `https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv` | https path pointing to a folder is not supported since https is not a filesystem. Please use other formats(wasbs/abfss/adl) instead for folder type of data.|
44-
|A path on Azure Storage | `wasbs://<containername>@<accountname>.blob.core.windows.net/<path_to_data>/` <br> `abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>` <br> `adl://<accountname>.azuredatalakestore.net/<path_to_data>/` ||
45-
|A path on a Datastore | `azureml://datastores/<data_store_name>/paths/<path>` ||
46-
|A path to a Data Asset | `azureml:<my_data>:<version>` ||
40+
|Location | Examples |
41+
|---------|---------|
42+
|A path on your local computer | `./home/username/data/my_data` |
43+
|A path on a public http(s) server | `https://raw.githubusercontent.com/pandas-dev/pandas/main/doc/data/titanic.csv` |
44+
|A path on Azure Storage | `https://<account_name>.blob.core.windows.net/<container_name>/<path>` <br> `abfss://<file_system>@<account_name>.dfs.core.windows.net/<path>` |
45+
|A path on a Datastore | `azureml://datastores/<data_store_name>/paths/<path>` |
46+
|A path to a Data Asset | `azureml:<my_data>:<version>` |
4747

4848
## Supported modes
4949

50-
When you run a job with data inputs/outputs, you can specify the *mode* - for example, whether you would like the data to be read-only mounted or downloaded to the compute target. The table below shows the possible modes for different type/mode/input/output combinations:
50+
When you run a job with data inputs/outputs, you can specify the *mode* - for example, whether the data should be read-only mounted, or downloaded to the compute target. This table shows the possible modes for different type/mode/input/output combinations:
5151

5252
Type | Input/Output | `upload` | `download` | `ro_mount` | `rw_mount` | `direct` | `eval_download` | `eval_mount`
5353
------ | ------ | :---: | :---: | :---: | :---: | :---: | :---: | :---:
5454
`uri_folder` | Input | | ✓ | ✓ | | ✓ | |
5555
`uri_file` | Input | | ✓ | ✓ | | ✓ | |
5656
`mltable` | Input | | ✓ | ✓ | | ✓ | ✓ | ✓
57-
`uri_folder` | Output | ✓ | | | ✓ | | |
58-
`uri_file` | Output | ✓ | | | ✓ | | |
57+
`uri_folder` | Output | ✓ | | | ✓ | | |
58+
`uri_file` | Output | ✓ | | | ✓ | | |
5959
`mltable` | Output | ✓ | | | ✓ | ✓ | |
6060

6161
> [!NOTE]
62-
> `eval_download` and `eval_mount` are unique to `mltable`. Whilst `ro_mount` is the default mode for MLTable, there are scenarios where an MLTable can yield files that are not necessarily co-located with the MLTable file in storage. Alternatively, an `mltable` can subset or shuffle the data that resides in the storage. That view is only visible if the MLTable file is actually evaluated by the engine. These modes will provide that view of the files.
62+
> `eval_download` and `eval_mount` are unique to `mltable`. The `ro_mount` is the default mode for MLTable. In some scenarios, however, an MLTable can yield files that are not necessarily co-located with the MLTable file in storage. Alternately, an `mltable` can subset or shuffle the data located in the storage resource. That view becomes visible only if the engine actually evaluates the MLTable file. These modes provide that view of the files.
6363
6464

6565
## Read data in a job
6666

6767
# [Azure CLI](#tab/cli)
6868

69-
Create a job specification YAML file (`<file-name>.yml`). Specify in the `inputs` section of the job:
69+
Create a job specification YAML file (`<file-name>.yml`). In the `inputs` section of the job, specify:
7070

71-
1. The `type`; whether the data is a specific file (`uri_file`) or a folder location (`uri_folder`) or an `mltable`.
72-
1. The `path` of where your data is located; can be any of the paths outlined in the [Supported Paths](#supported-paths) section.
71+
1. The `type`; whether the data is a specific file (`uri_file`), a folder location (`uri_folder`), or an `mltable`.
72+
1. The `path` of your data location; any of the paths outlined in the [Supported Paths](#supported-paths) section will work.
7373

7474
```yaml
7575
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
7676

7777
# Possible Paths for Data:
78-
# Blob: wasbs://<containername>@<accountname>.blob.core.windows.net/<folder>/<file>
78+
# Blob: https://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>
7979
# Datastore: azureml://datastores/paths/<folder>/<file>
8080
# Data Asset: azureml:<my_data>:<version>
8181

@@ -98,10 +98,10 @@ az ml job create -f <file-name>.yml
9898

9999
# [Python SDK](#tab/python)
100100

101-
The `Input` class allows you to define:
101+
Use the `Input` class to define:
102102

103-
1. The `type`; whether the data is a specific file (`uri_file`) or a folder location (`uri_folder`) or an `mltable`.
104-
1. The `path` of where your data is located; can be any of the paths outlined in the [Supported Paths](#supported-paths) section.
103+
1. The `type`; whether the data is a specific file (`uri_file`), a folder location (`uri_folder`), or an `mltable`.
104+
1. The `path` of your data location; any of the paths outlined in the [Supported Paths](#supported-paths) section will work.
105105

106106
```python
107107
from azure.ai.ml import command
@@ -118,7 +118,7 @@ ml_client = MLClient.from_config()
118118
# AssetTypes.MLTABLE
119119

120120
# Possible Paths for Data:
121-
# Blob: wasbs://<containername>@<accountname>.blob.core.windows.net/<folder>/<file>
121+
# Blob: https://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>
122122
# Datastore: azureml://datastores/paths/<folder>/<file>
123123
# Data Asset: azureml:<my_data>:<version>
124124

@@ -143,7 +143,7 @@ returned_job.services["Studio"].endpoint
143143
---
144144

145145
### Read V1 data assets
146-
This section outlines how you can read V1 `FileDataset` and `TabularDataset` data entities in a V2 job.
146+
This section explains how to read V1 `FileDataset` and `TabularDataset` data entities in a V2 job.
147147

148148
#### Read a `FileDataset`
149149

@@ -174,7 +174,7 @@ az ml job create -f <file-name>.yml
174174

175175
# [Python SDK](#tab/python)
176176

177-
In the `Input` object specify the `type` as `AssetTypes.MLTABLE` and `mode` as `InputOutputModes.EVAL_MOUNT`:
177+
In the `Input` object, specify the `type` as `AssetTypes.MLTABLE` and `mode` as `InputOutputModes.EVAL_MOUNT`:
178178

179179
```python
180180
from azure.ai.ml import command
@@ -205,13 +205,12 @@ job = command(
205205

206206
# submit the command
207207
returned_job = ml_client.jobs.create_or_update(job)
208-
# get a URL for the status of the job
208+
# get a URL for the job status
209209
returned_job.services["Studio"].endpoint
210210
```
211211

212212
---
213213

214-
215214
#### Read a `TabularDataset`
216215

217216
# [Azure CLI](#tab/cli)
@@ -241,7 +240,7 @@ az ml job create -f <file-name>.yml
241240

242241
# [Python SDK](#tab/python)
243242

244-
In the `Input` object specify the `type` as `AssetTypes.MLTABLE` and `mode` as `InputOutputModes.DIRECT`:
243+
In the `Input` object, specify the `type` as `AssetTypes.MLTABLE`, and `mode` as `InputOutputModes.DIRECT`:
245244

246245
```python
247246
from azure.ai.ml import command
@@ -280,18 +279,18 @@ returned_job.services["Studio"].endpoint
280279

281280
## Write data in a job
282281

283-
In your job you can write data to your cloud-based storage using *outputs*. The [Supported modes](#supported-modes) section showed that only job *outputs* can write data because the mode can be either `rw_mount` or `upload`.
282+
In your job, you can write data to your cloud-based storage with *outputs*. The [Supported modes](#supported-modes) section showed that only job *outputs* can write data, because the mode can be either `rw_mount` or `upload`.
284283

285284
# [Azure CLI](#tab/cli)
286285

287-
Create a job specification YAML file (`<file-name>.yml`), with the `outputs` section populated with the type and path of where you would like to write your data to:
286+
Create a job specification YAML file (`<file-name>.yml`), with the `outputs` section populated with the type and path where you'd like to write your data:
288287

289288
```yaml
290289
$schema: https://azuremlschemas.azureedge.net/latest/CommandJob.schema.json
291290

292291
# Possible Paths for Data:
293-
# Blob: wasbs://<containername>@<accountname>.blob.core.windows.net/<folder>/<file>
294-
# Datastore: azureml://datastores/<datastore_name>/paths/<folder>/<file>
292+
# Blob: https://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>
293+
# Datastore: azureml://datastores/paths/<folder>/<file>
295294
# Data Asset: azureml:<my_data>:<version>
296295

297296
code: src
@@ -311,7 +310,7 @@ environment: azureml:<environment_name>@latest
311310
compute: azureml:cpu-cluster
312311
```
313312
314-
Next create a job using the CLI:
313+
Next, create a job with the CLI:
315314
316315
```azurecli
317316
az ml job create --file <file-name>.yml
@@ -331,7 +330,7 @@ from azure.ai.ml.constants import AssetTypes
331330
# AssetTypes.MLTABLE
332331

333332
# Possible Paths for Data:
334-
# Blob: wasbs://<containername>@<accountname>.blob.core.windows.net/<folder>/<file>
333+
# Blob: https://<account_name>.blob.core.windows.net/<container_name>/<folder>/<file>
335334
# Datastore: azureml://datastores/paths/<folder>/<file>
336335
# Data Asset: azureml:<my_data>:<version>
337336

@@ -361,29 +360,29 @@ returned_job.services["Studio"].endpoint
361360

362361
---
363362

364-
## Data in pipelines
363+
## Data in pipelines
365364

366-
If you're working with Azure Machine Learning pipelines, you can read data into and move data between pipeline components with the Azure Machine Learning CLI v2 extension or the Python SDK v2.
365+
If you work with Azure Machine Learning pipelines, you can read data into and move data between pipeline components with the Azure Machine Learning CLI v2 extension, or the Python SDK v2.
367366

368367
### Azure Machine Learning CLI v2
369-
The following YAML file demonstrates how to use the output data from one component as the input for another component of the pipeline using the Azure Machine Learning CLI v2 extension:
368+
This YAML file shows how to use the output data from one component as the input for another component of the pipeline, with the Azure Machine Learning CLI v2 extension:
370369

371370
[!INCLUDE [CLI v2](../../includes/machine-learning-CLI-v2.md)]
372371

373372
:::code language="yaml" source="~/azureml-examples-main/CLI/jobs/pipelines-with-components/basics/3b_pipeline_with_data/pipeline.yml":::
374373

375374
### Python SDK v2
376375

377-
The following example defines a pipeline containing three nodes and moves data between each node.
376+
This example defines a pipeline that contains three nodes, and moves data between each node.
378377

379-
* `prepare_data_node` that loads the image and labels from Fashion MNIST data set into `mnist_train.csv` and `mnist_test.csv`.
380-
* `train_node` that trains a CNN model with Keras using the training data, `mnist_train.csv` .
381-
* `score_node` that scores the model using test data, `mnist_test.csv`.
378+
* `prepare_data_node` loads the image and labels from Fashion MNIST data set into `mnist_train.csv` and `mnist_test.csv`.
379+
* `train_node` trains a CNN model with Keras, using the `mnist_train.csv` training data.
380+
* `score_node` scores the model using `mnist_test.csv` test data.
382381

383382
[!notebook-python[] (~/azureml-examples-main/sdk/python/jobs/pipelines/2e_image_classification_keras_minist_convnet/image_classification_keras_minist_convnet.ipynb?name=build-pipeline)]
384383

385384
## Next steps
386385

387386
* [Train models](how-to-train-model.md)
388387
* [Tutorial: Create production ML pipelines with Python SDK v2](tutorial-pipeline-python-sdk.md)
389-
* Learn more about [Data in Azure Machine Learning](concept-data.md)
388+
* Learn more about [Data in Azure Machine Learning](concept-data.md)

0 commit comments

Comments
 (0)