Use existing Blob storage as datastore

vivek-dani · web-flow · commit 84095324c02a · 2022-09-24T21:21:57.000+05:30
Provide description on how to use existing Azure Blob storage as datastore in AML workspace
diff --git a/articles/machine-learning/how-to-prepare-datasets-for-automl-images.md b/articles/machine-learning/how-to-prepare-datasets-for-automl-images.md
@@ -47,12 +47,12 @@ It helps to create, manage, and monitor data labeling tasks for
 
 If you already have a data labeling project and you want to use that data, you can [export your labeled data as an Azure ML Dataset](how-to-create-image-labeling-projects.md#export-the-labels). You can then access the exported dataset under the 'Datasets' tab in Azure ML Studio, and download the underlying JSONL file from the Dataset details page under Data sources. The downloaded JSONL file can then be used to create an `MLTable` that can be used by automated ML for training computer vision models.
 
-### Using pre-labeled training data
+### Using pre-labeled training data from local machine
 If you have previously labeled data that you would like to use to train your model, you will first need to upload the images to the default Azure Blob Storage of your Azure ML Workspace and register it as a data asset. 
 
 Below scripts uploads the image data on your local machine at path "./data/odFridgeObjects" to datastore in Azure Blob Storage. Thereafter, it creates a new data asset with the name "fridge-items-images-object-detection" in your Azure ML Workspace. 
 
-If there already exists a data asset with name "fridge-items-images-object-detection" in your Azure ML Workspace, then it'll update its version number of data asset and make it point to new datastore in Azure Blob Storage where we uploaded the image data.
+If there already exists a data asset with name "fridge-items-images-object-detection" in your Azure ML Workspace, then it'll update its version number of data asset and make it point to new location in datastore in Azure Blob Storage where we uploaded the image data.
 
 # [Azure CLI](#tab/cli)
 [!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
@@ -80,7 +80,7 @@ az ml data create -f [PATH_TO_YML_FILE] --workspace-name [YOUR_AZURE_WORKSPACE]
 [!Notebook-python[] (~/azureml-examples-main/sdk/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/automl-image-object-detection-task-fridge-items.ipynb?name=upload-data)]
 ---
 
-If you already have your data present in Azure Blob Storage and want to create data asset out of it, you can do so by providing path to the location in Azure Blob Storage as shown below.
+If you already have your data present in an existing datastore and want to create data asset out of it, you can do so by providing path to the data in datastore as shown below, instead of providing path on your local machine.
 
 # [Azure CLI](#tab/cli)
 [!INCLUDE [cli v2](../../includes/machine-learning-cli-v2.md)]
@@ -91,7 +91,7 @@ Create a .yml file with the following configuration.
 $schema: https://azuremlschemas.azureedge.net/latest/data.schema.json
 name: fridge-items-images-object-detection
 description: Fridge-items images Object detection
-path: azureml://subscriptions/<my-subscription-id>/resourcegroups/<my-resource-group>/workspaces/<my-workspace>/datastores/<my-datastore>/paths/<path_to_image_data_folder>
+path: azureml://subscriptions/<fmy-subscription-id>/resourcegroups/<my-resource-group>/workspaces/<my-workspace>/datastores/<my-datastore>/paths/<path_to_image_data_folder>
 type: uri_folder
 ```
 
@@ -112,6 +112,12 @@ Next, you will need to get the label annotations in JSONL format. The schema of
 
 If your training data is in a different format (like, pascal VOC or COCO), [helper scripts](https://github.com/Azure/azureml-examples/blob/main/python-sdk/tutorials/automl-with-azureml/image-object-detection/coco2jsonl.py) to convert the data to JSONL are available in [notebook examples](https://github.com/Azure/azureml-examples/blob/sdk-preview/sdk/jobs/automl-standalone-jobs).
 
+
+### Using pre-labeled training data from Azure Blob storage
+If you have your labelled training data present in a container in Azure Blob storage, then you can access it directly from there by [creating a datastore referring to that container](how-to-prepare-datasets-for-automl-images.md#create-an-azure-blob-datastore). Once you have created a datastore in AML workspace, linked to a existing container in blob, you'll have to update authentication details for that datastore. You'll have to select subscription id, resource group and provide either Account Key or SAS token.
+
+![Update Authentication for Datastore.](media/how-to-prepare-datasets-for-automl-images/update-datastore-authentication.png)
+
 ## Create MLTable
 
 Once you have your labeled data in JSONL format, you can use it to create `MLTable` as shown below. MLtable packages your data into a consumable object for training.