Skip to content

Commit 49db647

Browse files
authored
Merge pull request #188641 from nibaccam/label-dataset
Labeled datasets| PM edits
2 parents f1139fd + 4881e14 commit 49db647

File tree

1 file changed

+16
-49
lines changed

1 file changed

+16
-49
lines changed

articles/machine-learning/how-to-use-labeled-dataset.md

Lines changed: 16 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -8,24 +8,23 @@ ms.service: machine-learning
88
ms.subservice: mldata
99
ms.topic: how-to
1010
ms.custom: data4ml
11-
ms.date: 10/21/2021
11+
ms.date: 02/15/2022
1212

1313
# Customer intent: As an experienced Python developer, I need to export my data labels and use them for machine learning tasks.
1414
---
1515

1616
# Create and explore Azure Machine Learning dataset with labels
1717

18-
In this article, you'll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration or a Torchvision dataset for image transformation.
18+
In this article, you'll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration.
1919

2020
## What are datasets with labels
2121

22-
We refer to Azure Machine Learning datasets with labels as labeled datasets. These specific dataset types of labeled datasets are only created as an output of Azure Machine Learning data labeling projects. Create a data labeling project [for image labeling](how-to-create-image-labeling-projects.md) or [text labeling](how-to-create-text-labeling-projects.md). Machine Learning supports data labeling projects for image classification, either multi-label or multi-class, and object identification together with bounded boxes.
22+
Azure Machine Learning datasets with labels are referred to as labeled datasets. These specific datasets are [TabularDatasets](/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset) with a dedicated label column and are only created as an output of Azure Machine Learning data labeling projects. Create a data labeling project [for image labeling](how-to-create-image-labeling-projects.md) or [text labeling](how-to-create-text-labeling-projects.md). Machine Learning supports data labeling projects for image classification, either multi-label or multi-class, and object identification together with bounded boxes.
2323

2424
## Prerequisites
2525

2626
* An Azure subscription. If you don’t have an Azure subscription, create a [free account](https://azure.microsoft.com/free/) before you begin.
2727
* The [Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/intro), or access to [Azure Machine Learning studio](https://ml.azure.com/).
28-
* Install the [azure-contrib-dataset](/python/api/azureml-contrib-dataset/) package
2928
* A Machine Learning workspace. See [Create an Azure Machine Learning workspace](how-to-manage-workspace.md).
3029
* Access to an Azure Machine Learning data labeling project. If you don't have a labeling project, first create one for [image labeling](how-to-create-image-labeling-projects.md) or [text labeling](how-to-create-text-labeling-projects.md).
3130

@@ -48,36 +47,30 @@ You can access the exported Azure Machine Learning dataset in the **Datasets** s
4847

4948
Once you have exported your labeled data to an Azure Machine Learning dataset, you can use AutoML to build computer vision models trained on your labeled data. Learn more at [Set up AutoML to train computer vision models with Python (preview)](how-to-auto-train-image-models.md)
5049

51-
## Explore labeled datasets
50+
## Explore labeled datasets via pandas dataframe
5251

53-
Load your labeled datasets into a pandas dataframe or Torchvision dataset to leverage popular open-source libraries for data exploration, as well as PyTorch provided libraries for image transformation and training.
52+
Load your labeled datasets into a pandas dataframe to leverage popular open-source libraries for data exploration with the [`to_pandas_dataframe()`](/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method from the `azureml-dataprep` class.
5453

55-
### Pandas dataframe
56-
57-
You can load labeled datasets into a pandas dataframe with the [`to_pandas_dataframe()`](/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method from the `azureml-contrib-dataset` class. Install the class with the following shell command:
54+
Install the class with the following shell command:
5855

5956
```shell
60-
pip install azureml-contrib-dataset
57+
pip install azureml-dataprep
6158
```
6259

63-
>[!NOTE]
64-
>The azureml.contrib namespace changes frequently, as we work to improve the service. As such, anything in this namespace should be considered as a preview, and not fully supported by Microsoft.
65-
66-
Azure Machine Learning offers the following file handling options for file streams when converting to a pandas dataframe.
67-
* Download: Download your data files to a local path.
68-
* Mount: Mount your data files to a mount point. Mount only works for Linux-based compute, including Azure Machine Learning notebook VM and Azure Machine Learning Compute.
69-
7060
In the following code, the `animal_labels` dataset is the output from a labeling project previously saved to the workspace.
61+
The exported dataset is a [TabularDataset](/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset). If you plan to use [download()] or [mount()] methods, be sure to set the parameter `stream column ='image_url'`
7162

7263
```Python
7364
import azureml.core
74-
import azureml.contrib.dataset
7565
from azureml.core import Dataset, Workspace
76-
from azureml.contrib.dataset import FileHandlingOption
66+
7767

7868
# get animal_labels dataset from the workspace
7969
animal_labels = Dataset.get_by_name(workspace, 'animal_labels')
80-
animal_pd = animal_labels.to_pandas_dataframe(file_handling_option=FileHandlingOption.DOWNLOAD, target_path='./download/', overwrite_download=True)
70+
animal_pd = animal_labels.to_pandas_dataframe()
71+
72+
# download the images to local
73+
animal_labels.download(stream_column='image_url')
8174

8275
import matplotlib.pyplot as plt
8376
import matplotlib.image as mpimg
@@ -87,33 +80,7 @@ img = mpimg.imread(animal_pd.loc[0,'image_url'])
8780
imgplot = plt.imshow(img)
8881
```
8982

90-
### Torchvision datasets
91-
92-
You can load labeled datasets into Torchvision dataset with the [to_torchvision()](/python/api/azureml-contrib-dataset/azureml.contrib.dataset.tabulardataset#to-torchvision--) method also from the `azureml-contrib-dataset` class. To use this method, you need to have [PyTorch](https://pytorch.org/) installed.
93-
94-
In the following code, the `animal_labels` dataset is the output from a labeling project previously saved to the workspace.
95-
96-
```python
97-
import azureml.core
98-
import azureml.contrib.dataset
99-
from azureml.core import Dataset, Workspace
100-
from azureml.contrib.dataset import FileHandlingOption
101-
102-
from torchvision.transforms import functional as F
103-
104-
# get animal_labels dataset from the workspace
105-
animal_labels = Dataset.get_by_name(workspace, 'animal_labels')
106-
107-
# load animal_labels dataset into torchvision dataset
108-
pytorch_dataset = animal_labels.to_torchvision()
109-
img = pytorch_dataset[0][0]
110-
print(type(img))
111-
112-
# use methods from torchvision to transform the img into grayscale
113-
pil_image = F.to_pil_image(img)
114-
gray_image = F.to_grayscale(pil_image, num_output_channels=3)
115-
116-
imgplot = plt.imshow(gray_image)
117-
```
118-
11983
## Next steps
84+
85+
* Learn to [train image classification models in Azure](./tutorial-train-deploy-notebook.md)
86+
* [Set up AutoML to train computer vision models with Python (preview)](how-to-auto-train-image-models.md)

0 commit comments

Comments
 (0)