You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-use-labeled-dataset.md
+16-49Lines changed: 16 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,24 +8,23 @@ ms.service: machine-learning
8
8
ms.subservice: mldata
9
9
ms.topic: how-to
10
10
ms.custom: data4ml
11
-
ms.date: 10/21/2021
11
+
ms.date: 02/15/2022
12
12
13
13
# Customer intent: As an experienced Python developer, I need to export my data labels and use them for machine learning tasks.
14
14
---
15
15
16
16
# Create and explore Azure Machine Learning dataset with labels
17
17
18
-
In this article, you'll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration or a Torchvision dataset for image transformation.
18
+
In this article, you'll learn how to export the data labels from an Azure Machine Learning data labeling project and load them into popular formats such as, a pandas dataframe for data exploration.
19
19
20
20
## What are datasets with labels
21
21
22
-
We refer to Azure Machine Learning datasets with labels as labeled datasets. These specific dataset types of labeled datasets are only created as an output of Azure Machine Learning data labeling projects. Create a data labeling project [for image labeling](how-to-create-image-labeling-projects.md) or [text labeling](how-to-create-text-labeling-projects.md). Machine Learning supports data labeling projects for image classification, either multi-label or multi-class, and object identification together with bounded boxes.
22
+
Azure Machine Learning datasets with labels are referred to as labeled datasets. These specific datasets are [TabularDatasets](/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset) with a dedicated label column and are only created as an output of Azure Machine Learning data labeling projects. Create a data labeling project [for image labeling](how-to-create-image-labeling-projects.md) or [text labeling](how-to-create-text-labeling-projects.md). Machine Learning supports data labeling projects for image classification, either multi-label or multi-class, and object identification together with bounded boxes.
23
23
24
24
## Prerequisites
25
25
26
26
* An Azure subscription. If you don’t have an Azure subscription, create a [free account](https://azure.microsoft.com/free/) before you begin.
27
27
* The [Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/intro), or access to [Azure Machine Learning studio](https://ml.azure.com/).
28
-
* Install the [azure-contrib-dataset](/python/api/azureml-contrib-dataset/) package
29
28
* A Machine Learning workspace. See [Create an Azure Machine Learning workspace](how-to-manage-workspace.md).
30
29
* Access to an Azure Machine Learning data labeling project. If you don't have a labeling project, first create one for [image labeling](how-to-create-image-labeling-projects.md) or [text labeling](how-to-create-text-labeling-projects.md).
31
30
@@ -48,36 +47,30 @@ You can access the exported Azure Machine Learning dataset in the **Datasets** s
48
47
49
48
Once you have exported your labeled data to an Azure Machine Learning dataset, you can use AutoML to build computer vision models trained on your labeled data. Learn more at [Set up AutoML to train computer vision models with Python (preview)](how-to-auto-train-image-models.md)
50
49
51
-
## Explore labeled datasets
50
+
## Explore labeled datasets via pandas dataframe
52
51
53
-
Load your labeled datasets into a pandas dataframe or Torchvision dataset to leverage popular open-source libraries for data exploration, as well as PyTorch provided libraries for image transformation and training.
52
+
Load your labeled datasets into a pandas dataframe to leverage popular open-source libraries for data exploration with the [`to_pandas_dataframe()`](/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method from the `azureml-dataprep` class.
54
53
55
-
### Pandas dataframe
56
-
57
-
You can load labeled datasets into a pandas dataframe with the [`to_pandas_dataframe()`](/python/api/azureml-core/azureml.data.tabulardataset#to-pandas-dataframe-on-error--null---out-of-range-datetime--null--) method from the `azureml-contrib-dataset` class. Install the class with the following shell command:
54
+
Install the class with the following shell command:
58
55
59
56
```shell
60
-
pip install azureml-contrib-dataset
57
+
pip install azureml-dataprep
61
58
```
62
59
63
-
>[!NOTE]
64
-
>The azureml.contrib namespace changes frequently, as we work to improve the service. As such, anything in this namespace should be considered as a preview, and not fully supported by Microsoft.
65
-
66
-
Azure Machine Learning offers the following file handling options for file streams when converting to a pandas dataframe.
67
-
* Download: Download your data files to a local path.
68
-
* Mount: Mount your data files to a mount point. Mount only works for Linux-based compute, including Azure Machine Learning notebook VM and Azure Machine Learning Compute.
69
-
70
60
In the following code, the `animal_labels` dataset is the output from a labeling project previously saved to the workspace.
61
+
The exported dataset is a [TabularDataset](/python/api/azureml-core/azureml.data.tabular_dataset.tabulardataset). If you plan to use [download()] or [mount()] methods, be sure to set the parameter `stream column ='image_url'`
71
62
72
63
```Python
73
64
import azureml.core
74
-
import azureml.contrib.dataset
75
65
from azureml.core import Dataset, Workspace
76
-
from azureml.contrib.dataset import FileHandlingOption
You can load labeled datasets into Torchvision dataset with the [to_torchvision()](/python/api/azureml-contrib-dataset/azureml.contrib.dataset.tabulardataset#to-torchvision--) method also from the `azureml-contrib-dataset` class. To use this method, you need to have [PyTorch](https://pytorch.org/) installed.
93
-
94
-
In the following code, the `animal_labels` dataset is the output from a labeling project previously saved to the workspace.
95
-
96
-
```python
97
-
import azureml.core
98
-
import azureml.contrib.dataset
99
-
from azureml.core import Dataset, Workspace
100
-
from azureml.contrib.dataset import FileHandlingOption
101
-
102
-
from torchvision.transforms import functional as F
0 commit comments