Skip to content

Commit c1f0c74

Browse files
authored
Merge pull request #105635 from sdgilley/sdg-ml-assist
ML assisted labels
2 parents ecd2d1c + be22f81 commit c1f0c74

File tree

2 files changed

+60
-10
lines changed

2 files changed

+60
-10
lines changed

articles/machine-learning/how-to-create-labeling-projects.md

Lines changed: 49 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,26 @@
11
---
22
title: Create a data labeling project
33
titleSuffix: Azure Machine Learning
4-
description: Learn ho to create and run labeling projects to tag data for machine learning.
5-
author: lobrien
6-
ms.author: laobri
4+
description: Learn how to create and run labeling projects to tag data for machine learning. The tools include ml assisted labeling, or human in the loop labeling to aid with the task.
5+
author: sdgilley
6+
ms.author: sgilley
77
ms.service: machine-learning
88
ms.topic: tutorial
9-
ms.date: 11/04/2019
9+
ms.date: 03/01/2020
1010

1111
---
1212

1313
# Create a data labeling project and export labels
1414

15+
[!INCLUDE [aml-applies-to-basic-enterprise-sku](../../includes/aml-applies-to-basic-enterprise-sku.md)]
16+
1517
Labeling voluminous data in machine learning projects is often a headache. Projects that have a computer-vision component, such as image classification or object detection, generally require labels for thousands of images.
1618

17-
[Azure Machine Learning](https://ml.azure.com/) gives you a central place to create, manage, and monitor labeling projects. Use it to coordinate data, labels, and team members to efficiently manage labeling tasks. Machine Learning supports image classification, either multi-label or multi-class, and object identification together with bounded boxes.
19+
[Azure Machine Learning](https://ml.azure.com/) gives you a central place to create, manage, and monitor labeling projects. Use it to coordinate data, labels, and team members to efficiently manage labeling tasks. Machine Learning supports image classification, either multi-label or multi-class, and object identification with bounded boxes.
1820

1921
Machine Learning tracks progress and maintains the queue of incomplete labeling tasks. Labelers don't need an Azure account to participate. After they are authenticated with your Microsoft account or [Azure Active Directory](https://docs.microsoft.com/azure/active-directory/active-directory-whatis), they can do as much labeling as their time allows.
2022

21-
In Machine Learning, you start and stop the project, add and remove people and teams, and monitor progress. You can export labeled data in COCO format or as an Azure Machine Learning dataset.
23+
You start and stop the project, add and remove labelers and teams, and monitor the labeling progress. You can export labeled data in COCO format or as an Azure Machine Learning dataset.
2224

2325
> [!Important]
2426
> Only image classification and object identification labeling projects are currently supported. Additionally, the data images must be available in an Azure blob datastore. (If you do not have an existing datastore, you may upload images during project creation.)
@@ -35,7 +37,7 @@ In this article, you'll learn how to:
3537

3638
## Prerequisites
3739

38-
* The data that you want to label, either in local files or in Azure storage.
40+
* The data that you want to label, either in local files or in Azure blob storage.
3941
* The set of labels that you want to apply.
4042
* The instructions for labeling.
4143
* An Azure subscription. If you don’t have an Azure subscription, create a [free account](https://aka.ms/AMLFree) before you begin.
@@ -51,8 +53,8 @@ To create a project, select **Add project**. Give the project an appropriate nam
5153

5254
![Labeling project creation wizard](./media/how-to-create-labeling-projects/labeling-creation-wizard.png)
5355

54-
* Choose **Image Classification Multi-label** for projects when you want to apply *one or more* labels from a set of classes to an image. For instance, a photo of a dog might be labeled with both *dog* and *daytime*.
5556
* Choose **Image Classification Multi-class** for projects when you want to apply only a *single class* from a set of classes to an image.
57+
* Choose **Image Classification Multi-label** for projects when you want to apply *one or more* labels from a set of classes to an image. For instance, a photo of a dog might be labeled with both *dog* and *daytime*.
5658
* Choose **Object Identification (Bounding Box)** for projects when you want to assign a class and a bounding box to each object within an image.
5759

5860
Select **Next** when you're ready to continue.
@@ -61,6 +63,7 @@ Select **Next** when you're ready to continue.
6163

6264
If you already created a dataset that contains your data, select it from the **Select an existing dataset** drop-down list. Or, select **Create a dataset** to use an existing Azure datastore or to upload local files.
6365

66+
6467
### Create a dataset from an Azure datastore
6568

6669
In many cases, it's fine to just upload local files. But [Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/) provides a faster and more robust way to transfer a large amount of data. We recommend Storage Explorer as the default way to move files.
@@ -78,6 +81,9 @@ To create a dataset from data that you've already stored in Azure Blob storage:
7881
1. Select **Next**.
7982
1. Confirm the details. Select **Back** to modify the settings or **Create** to create the dataset.
8083

84+
> [!NOTE]
85+
> The data you choose is loaded into your project. Adding more data to the datastore will not appear in this project once the project is created.
86+
8187
### Create a dataset from uploaded data
8288

8389
To directly upload your data:
@@ -101,7 +107,7 @@ Enter one label per row. Use the **+** button to add a new row. If you have more
101107

102108
## Describe the labeling task
103109

104-
It's important to clearly explain the labeling task. On the **Labeling instructions** page, you can add a link to an external site for labeling instructions. Keep the instructions task-oriented and appropriate to the audience. Consider these questions:
110+
It's important to clearly explain the labeling task. On the **Labeling instructions** page, you can add a link to an external site for labeling instructions, or provide instructions in the edit box on the page. Keep the instructions task-oriented and appropriate to the audience. Consider these questions:
105111

106112
* What are the labels they'll see, and how will they choose among them? Is there a reference text to refer to?
107113
* What should they do if no label seems appropriate?
@@ -115,10 +121,43 @@ For bounding boxes, important questions include:
115121

116122
* How is the bounding box defined for this task? Should it be entirely on the interior of the object, or should it be on the exterior? Should it be cropped as closely as possible, or is some clearance acceptable?
117123
* What level of care and consistency do you expect the labelers to apply in defining bounding boxes?
124+
* How to label the object that is partially shown in the image?
125+
* How to label the object that partially covered by other object?
118126

119127
>[!NOTE]
120128
> Be sure to note that the labelers will be able to select the first 9 labels by using number keys 1-9.
121129
130+
## Use ML assisted labeling
131+
132+
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-enterprise-sku.md)]
133+
134+
The **ML assisted labeling** page lets you trigger automatic machine learning models to accelerate the labeling task. At the beginning of your labeling project, the images are shuffled into a random order to reduce potential bias. However, any biases that are present in the dataset will be reflected in the trained model. For example, if 80% of your images are of a single class, then approximately 80% of the data used to train the model will be of that class. This training does not include active learning.
135+
136+
This feature is available for image classification (multi-class or multi-label) tasks.
137+
138+
Select *Enable ML assisted labeling* and specify a GPU to enable assisted labeling, which consists of two phases:
139+
* Clustering
140+
* Prelabeling
141+
142+
The exact number of labeled images necessary to start assisted labeling is not a fixed number. This can vary significantly from one labeling project to another. For some projects, is sometimes possible to see prelabel or cluster tasks after 300 images have been manually labeled. ML Assisted Labeling uses a technique called *Transfer Learning*, which uses a pre-trained model to jump-start the training process. If your dataset's classes are similar to those in the pre-trained model, pre-labels may be available after only a few hundred manually labeled images. If your dataset is significantly different from the data used to pre-train the model, it may take much longer.
143+
144+
Since the final labels still rely on input from the labeler, this technology is sometimes called *human in the loop* labeling.
145+
146+
### Clustering
147+
148+
After a certain number of labels are submitted, the machine learning model starts to group together similar images. These similar images are presented to the labelers on the same screen to speed up manual tagging. Clustering is especially useful when the labeler is viewing a grid of 4, 6, or 9 images.
149+
150+
Once a machine learning model has been trained on your manually labeled data, the model is truncated to its last fully-connected layer. Unlabeled images are then passed through the truncated model in a process commonly known as "embedding" or "featurization." This embeds each image in a high-dimensional space defined by this model layer. Images which are nearest neighbors in the space are used for clustering tasks.
151+
152+
### Prelabeling
153+
154+
After more image labels are submitted, a classification model is used to predict image tags. The labeler now sees pages that contain predicted labels already present on each image. The task is then to review these labels and correct any mis-labeled images before submitting the page.
155+
156+
Once a machine learning model has been trained on your manually labeled data, the model is evaluated on a test set of manually labeled images to determine its accuracy at a variety of different confidence thresholds. This evaluation process is used to determine a confidence threshold above which the model is accurate enough to show pre-labels. The model is then evaluated against unlabeled data. Images with predictions more confident than this threshold are used for pre-labeling.
157+
158+
> [!NOTE]
159+
> ML assisted labeling is available **only** in Enterprise edition workspaces.
160+
122161
## Initialize the labeling project
123162

124163
After the labeling project is initialized, some aspects of the project are immutable. You can't change the task type or dataset. You *can* modify labels and the URL for the task description. Carefully review the settings before you create the project. After you submit the project, you're returned to the **Data Labeling** homepage, which will show the project as **Initializing**. This page doesn't automatically refresh. So, after a pause, manually refresh the page to see the project's status as **Created**.
@@ -145,7 +184,7 @@ To pause or restart the project, select the **Pause**/**Start** button. You can
145184

146185
You can label data directly from the **Project details** page by selecting **Label data**.
147186

148-
## Add labels to a project
187+
## Add new label class to a project
149188

150189
During the labeling process, you may find that additional labels are needed to classify your images. For example, you may want to add an "Unknown" or "Other" label to indicate confusing images.
151190

articles/machine-learning/how-to-label-images.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,16 @@ Azure enables the **Submit** button when you've tagged all the images on the pag
5454

5555
After you submit tags for the data at hand, Azure refreshes the page with a new set of images from the work queue.
5656

57+
### Assisted machine learning
58+
59+
Machine learning algorithms may be triggered during a multi-class or multi-label classification task. If these algorithms are enabled in your project, you may see the following:
60+
61+
* After some amount of images have been labeled, you may see **Tasks clustered** at the top of your screen next to the project name. This means that images are grouped together to present similar images on the same page. If so, switch to one of the multiple image views to take advantage of the grouping.
62+
63+
* At a later point, you may see **Tasks prelabeled** next to the project name. Images will then appear with a suggested label that comes from a machine learning classification model. No machine learning model has 100% accuracy. While we only use images for which the model is confident, these images might still be incorrectly prelabeled. When you see these labels, correct any wrong labels before submitting the page.
64+
65+
Especially early in a labeling project, the machine learning model may only be accurate enough to prelabel a small subset of images. Once these images are labeled, the labeling project will return to manual labeling to gather more data for the next round of model training. Over time, the model will become more confident about a higher proportion of images, resulting in more prelabel tasks later in the project.
66+
5767
## Tag images for multi-class classification
5868

5969
If your project is of type "Image Classification Multi-Class," you'll assign a single tag to the entire image. To review the directions at any time, go to the **Instructions** page and select **View detailed instructions**.
@@ -78,6 +88,7 @@ To correct a mistake, click the "**X**" to clear an individual tag or select the
7888

7989
Azure will only enable the **Submit** button after you've applied at least one tag to each image. Select **Submit** to save your work.
8090

91+
8192
## Tag images and specify bounding boxes for object detection
8293

8394
If your project is of type "Object Identification (Bounding Boxes)," you'll specify one or more bounding boxes in the image and apply a tag to each box. Images can have multiple bounding boxes, each with a single tag. Use **View detailed instructions** to determine if multiple bounding boxes are used in your project.

0 commit comments

Comments
 (0)