Skip to content

Commit 2a497db

Browse files
authored
Merge pull request #212638 from MadhuM02/madhumaddi/dataset_ui_creation
registering dataset through UI.
2 parents b2b38bb + 5944a2b commit 2a497db

File tree

5 files changed

+22
-3
lines changed

5 files changed

+22
-3
lines changed

articles/machine-learning/how-to-auto-train-image-models.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,7 @@ If your training data is in a different format (like, pascal VOC or COCO), you c
110110
> The training data needs to have at least 10 images in order to be able to submit an AutoML run.
111111

112112
> [!Warning]
113-
> Creation of `MLTable` from data in JSONL format is supported using the SDK and CLI only, for this capability. Creating the `MLTable` via UI is not supported at this time. As of now, the UI doesn't recognize the StreamInfo datatype, which is the datatype used for image URLs in JSONL format.
113+
> Creation of `MLTable` from data in JSONL format is supported using the SDK and CLI only, for this capability. Creating the `MLTable` via UI is not supported at this time.
114114

115115

116116
### JSONL schema samples

articles/machine-learning/how-to-prepare-datasets-for-automl-images.md

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ If your labeled training data is in a different format (like, pascal VOC or COCO
3636
## Get labeled data
3737
In order to train computer vision models using AutoML, you need to first get labeled training data. The images need to be uploaded to the cloud and label annotations need to be in JSONL format. You can either use the Azure ML Data Labeling tool to label your data or you could start with pre-labeled image data.
3838

39-
## Using Azure ML Data Labeling tool to label your training data
39+
### Using Azure ML Data Labeling tool to label your training data
4040
If you don't have pre-labeled data, you can use Azure Machine Learning's [data labeling tool](how-to-create-image-labeling-projects.md) to manually label images. This tool automatically generates the data required for training in the accepted format.
4141

4242
It helps to create, manage, and monitor data labeling tasks for
@@ -72,6 +72,11 @@ my_training_data_input = Input(
7272
mode=InputOutputModes.DIRECT
7373
)
7474
```
75+
76+
# [Studio](#tab/Studio)
77+
78+
Please refer to Cli/Sdk tabs for reference.
79+
7580
---
7681

7782
### Using pre-labeled training data from local machine
@@ -104,7 +109,12 @@ az ml data create -f [PATH_TO_YML_FILE] --workspace-name [YOUR_AZURE_WORKSPACE]
104109

105110
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
106111

107-
[!Notebook-python[] (~/azureml-examples-v2samplesreorg/sdk/python/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/automl-image-object-detection-task-fridge-items.ipynb?name=upload-data)]
112+
[!Notebook-python[] (~/azureml-examples-main/sdk/python/jobs/automl-standalone-jobs/automl-image-object-detection-task-fridge-items/automl-image-object-detection-task-fridge-items.ipynb?name=upload-data)]
113+
114+
# [Studio](#tab/Studio)
115+
116+
![Animation showing how to register a dataset from local files](media\how-to-prepare-datasets-for-automl-images\ui-dataset-local.gif)
117+
108118
---
109119

110120
If you already have your data present in an existing datastore and want to create a data asset out of it, you can do so by providing the path to the data in the datastore, instead of providing the path of your local machine. Update the code [above](how-to-prepare-datasets-for-automl-images.md#using-pre-labeled-training-data-from-local-machine) with the following snippet.
@@ -133,12 +143,21 @@ my_data = Data(
133143
name="fridge-items-images-object-detection",
134144
)
135145
```
146+
147+
# [Studio](#tab/Studio)
148+
149+
![Animation showing how to register a dataset from data already present in datastore](media\how-to-prepare-datasets-for-automl-images\ui-dataset-datastore.gif)
150+
136151
---
137152

138153
Next, you will need to get the label annotations in JSONL format. The schema of labeled data depends on the computer vision task at hand. Refer to [schemas for JSONL files for AutoML computer vision experiments](reference-automl-images-schema.md) to learn more about the required JSONL schema for each task type.
139154

140155
If your training data is in a different format (like, pascal VOC or COCO), [helper scripts](https://github.com/Azure/azureml-examples/blob/v2samplesreorg/v1/python-sdk/tutorials/automl-with-azureml/image-object-detection/coco2jsonl.py) to convert the data to JSONL are available in [notebook examples](https://github.com/Azure/azureml-examples/blob/v2samplesreorg/sdk/python/jobs/automl-standalone-jobs).
141156

157+
Once you have created jsonl file following the above steps, you can register it as a data asset using UI. Make sure you select `stream` type in schema section as shown below.
158+
159+
![Animation showing how to register a data asset from the jsonl files](media\how-to-prepare-datasets-for-automl-images\ui-dataset-jsnol.gif)
160+
142161
### Using pre-labeled training data from Azure Blob storage
143162
If you have your labeled training data present in a container in Azure Blob storage, then you can access it directly from there by [creating a datastore referring to that container](how-to-datastore.md#create-an-azure-blob-datastore).
144163

2.5 MB
Loading
2.58 MB
Loading
2.59 MB
Loading

0 commit comments

Comments
 (0)