Skip to content

Commit 3216d3f

Browse files
committed
address bugs and clarifications
1 parent 761560a commit 3216d3f

File tree

3 files changed

+22
-8
lines changed

3 files changed

+22
-8
lines changed

articles/machine-learning/how-to-access-data.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ When you register an Azure Storage solution as a datastore, you automatically cr
7575

7676
>[!IMPORTANT]
7777
> As part of the current datastore create and register process, Azure Machine Learning validates that the user provided principal (username, service principal or SAS token) has access to the underlying storage service.
78-
<br>
78+
<br><br>
7979
However, for Azure Data Lake Storage Gen 1 and 2 datastores, this validation happens later when data access methods like [`from_files()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.filedatasetfactory?view=azure-ml-py) or [`from_delimited_files()`](https://docs.microsoft.com/python/api/azureml-core/azureml.data.dataset_factory.tabulardatasetfactory?view=azure-ml-py#from-parquet-files-path--validate-true--include-path-false--set-column-types-none--partition-format-none-) are called.
8080

8181
### Python SDK
@@ -85,10 +85,13 @@ All the register methods are on the [`Datastore`](https://docs.microsoft.com/pyt
8585
You can find the information that you need to populate the `register()` method by using the [Azure portal](https://portal.azure.com):
8686

8787
1. Select **Storage Accounts** on the left pane, and choose the storage account that you want to register.
88-
2. For information like the account name, container, and file share name, go to the **Overview** page. For authentication information, like account key or SAS token, go to **Access Keys** on the **Settings** pane.
88+
2. For information like the account name, container, and file share name, go to the **Overview** page.
89+
3. For authentication information, like account key or SAS token, go to **Access Keys** on the **Settings** pane.
90+
91+
4. For service principal items like, tenant ID and client ID, go to the **Overview** page of your **App registrations**.
8992

9093
> [!IMPORTANT]
91-
> If your storage account is in a virtual network, only the creation of an Azure blob datastore is supported. To grant your workspace access to your storage account, set the parameter `grant_workspace_access` to `True`.
94+
> If your storage account is in a virtual network, only creation of Blob, File share, ADLS Gen 1 and ADLS Gen 2 datastores **via the SDK** is supported. To grant your workspace access to your storage account, set the parameter `grant_workspace_access` to `True`.
9295
9396
The following examples show how to register an Azure blob container, an Azure file share, and Azure Data Lake Storage Generation 2 as a datastore. For other storage services, please see the [reference documentation for the `register_azure_*` methods](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py#methods).
9497

@@ -132,7 +135,7 @@ file_datastore = Datastore.register_azure_file_share(workspace=ws,
132135

133136
#### Azure Data Lake Storage Generation 2
134137

135-
For an Azure Data Lake Storage Generation 2 (ADLS Gen 2) datastore, use [register_azure_data_lake_gen2()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py#register-azure-data-lake-gen2-workspace--datastore-name--filesystem--account-name--tenant-id--client-id--client-secret--resource-url-none--authority-url-none--protocol-none--endpoint-none--overwrite-false-) to register a credential datastore connected to an Azure DataLake Gen 2 storage with [service principal permissions](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal). Learn more about [access control set up for ADLS Gen 2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control).
138+
For an Azure Data Lake Storage Generation 2 (ADLS Gen 2) datastore, use [register_azure_data_lake_gen2()](https://docs.microsoft.com/python/api/azureml-core/azureml.core.datastore.datastore?view=azure-ml-py#register-azure-data-lake-gen2-workspace--datastore-name--filesystem--account-name--tenant-id--client-id--client-secret--resource-url-none--authority-url-none--protocol-none--endpoint-none--overwrite-false-) to register a credential datastore connected to an Azure DataLake Gen 2 storage with [service principal permissions](https://docs.microsoft.com/azure/active-directory/develop/howto-create-service-principal-portal). In order to utilize your service principal you need to [register your application](https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals). Learn more about [access control set up for ADLS Gen 2](https://docs.microsoft.com/azure/storage/blobs/data-lake-storage-access-control).
136139

137140
The following code creates and registers the `adlsgen2_datastore_name` datastore to the `ws` workspace. This datastore accesses the file system `test` on the `account_name` storage account, by using the provided service principal credentials.
138141

@@ -160,12 +163,19 @@ adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
160163

161164
Create a new datastore in a few steps in Azure Machine Learning studio:
162165

166+
> [!IMPORTANT]
167+
> If your storage account is in a virtual network, only creation of datastores [via the SDK](#python-sdk) is supported.
168+
163169
1. Sign in to [Azure Machine Learning studio](https://ml.azure.com/).
164170
1. Select **Datastores** on the left pane under **Manage**.
165171
1. Select **+ New datastore**.
166172
1. Complete the form for a new datastore. The form intelligently updates itself based on your selections for Azure Storage type and authentication type.
167173

168-
You can find the information that you need to populate the form on the [Azure portal](https://portal.azure.com). Select **Storage Accounts** on the left pane, and choose the storage account that you want to register. The **Overview** page provides information such as the account name, container, and file share name. For authentication items, like account key or SAS token, go to **Account Keys** on the **Settings** pane.
174+
You can find the information that you need to populate the form on the [Azure portal](https://portal.azure.com). Select **Storage Accounts** on the left pane, and choose the storage account that you want to register. The **Overview** page provides information such as the account name, container, and file share name.
175+
176+
* For authentication items, like account key or SAS token, go to **Account Keys** on the **Settings** pane.
177+
178+
* For service principal items like, tenant ID and client ID, go to the **Overview** page of your **App registrations**.
169179

170180
The following example demonstrates what the form looks like when you create an Azure blob datastore:
171181

articles/machine-learning/how-to-create-your-first-pipeline.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -357,6 +357,9 @@ iris_dataset = run_context.input_datasets['iris_data']
357357
dataframe = iris_dataset.to_pandas_dataframe()
358358
```
359359

360+
>[!NOTE]
361+
> All types of datasets (Blob, File Share, ADLS Gen 2, etc. ) can be used as input to any pipeline step, and output can be used in the DataTransferStep. However, writing output data (PipelineData)to ADLS Gen 2 is not supported.
362+
360363
For more information, see the [azure-pipeline-steps package](https://docs.microsoft.com/python/api/azureml-pipeline-steps/?view=azure-ml-py) and [Pipeline class](https://docs.microsoft.com/python/api/azureml-pipeline-core/azureml.pipeline.core.pipeline%28class%29?view=azure-ml-py) reference.
361364

362365
## Submit the pipeline

articles/machine-learning/how-to-train-with-datasets.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ In this article, you learn the two ways to consume [Azure Machine Learning datas
2525

2626
- Option 2: If you have unstructured data, create a FileDataset and mount or download files to a remote compute for training.
2727

28-
Azure Machine Learning datasets provide a seamless integration with Azure Machine Learning training products like [ScriptRun](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrun?view=azure-ml-py), [Estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py) and [HyperDrive](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py).
28+
Azure Machine Learning datasets provide a seamless integration with Azure Machine Learning training products like [ScriptRun](https://docs.microsoft.com/python/api/azureml-core/azureml.core.scriptrun?view=azure-ml-py), [Estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.estimator?view=azure-ml-py), [HyperDrive](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) and [Azure Machine Learning pipelines](how-to-create-your-first-pipeline.md).
2929

3030
## Prerequisites
3131

@@ -98,11 +98,12 @@ experiment_run = experiment.submit(est)
9898
experiment_run.wait_for_completion(show_output=True)
9999
```
100100

101+
101102
## Option 2: Mount files to a remote compute target
102103

103104
If you want to make your data files available on the compute target for training, use [FileDataset](https://docs.microsoft.com/python/api/azureml-core/azureml.data.file_dataset.filedataset?view=azure-ml-py) to mount or download files referred by it.
104105

105-
### Mount v.s. Download
106+
### Mount vs. Download
106107
When you mount a dataset, you attach the files referenced by the dataset to a directory (mount point) and make it available on the compute target. Mounting is supported for Linux-based computes, including Azure Machine Learning Compute, virtual machines, and HDInsight. If your data size exceeds the compute disk size, or you are only loading part of dataset in your script, mounting is recommended. Because downloading a dataset bigger than the disk size will fail, and mounting will only load the part of data used by your script at the time of processing.
107108

108109
When you download a dataset, all the files referenced by the dataset will be downloaded to the compute target. Downloading is supported for all compute types. If your script process all files referenced by the dataset, and your compute disk can fit in your full dataset, downloading is recommended to avoid the overhead of streaming data from storage services.
@@ -197,4 +198,4 @@ The [dataset notebooks](https://aka.ms/dataset-tutorial) demonstrate and expand
197198

198199
* [Train image classification models](https://aka.ms/filedataset-samplenotebook) with FileDatasets
199200

200-
* [Create and manage environments for training and deployment](how-to-use-environments.md)
201+
* [Train with datasets using pipelines](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/work-with-data/datasets-tutorial/pipeline-with-datasets/pipeline-for-image-classification.ipynb)

0 commit comments

Comments
 (0)