Skip to content

Commit be22f81

Browse files
committed
update with FAQ info
1 parent 9e67886 commit be22f81

File tree

2 files changed

+20
-6
lines changed

2 files changed

+20
-6
lines changed

articles/machine-learning/how-to-create-labeling-projects.md

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -131,17 +131,29 @@ For bounding boxes, important questions include:
131131

132132
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-enterprise-sku.md)]
133133

134-
The **ML assisted labeling** page lets you trigger automatic machine learning models to accelerate the labeling task. This feature is available for image classification (multi-class or multi-label) tasks.
134+
The **ML assisted labeling** page lets you trigger automatic machine learning models to accelerate the labeling task. At the beginning of your labeling project, the images are shuffled into a random order to reduce potential bias. However, any biases that are present in the dataset will be reflected in the trained model. For example, if 80% of your images are of a single class, then approximately 80% of the data used to train the model will be of that class. This training does not include active learning.
135+
136+
This feature is available for image classification (multi-class or multi-label) tasks.
135137

136138
Select *Enable ML assisted labeling* and specify a GPU to enable assisted labeling, which consists of two phases:
139+
* Clustering
140+
* Prelabeling
141+
142+
The exact number of labeled images necessary to start assisted labeling is not a fixed number. This can vary significantly from one labeling project to another. For some projects, is sometimes possible to see prelabel or cluster tasks after 300 images have been manually labeled. ML Assisted Labeling uses a technique called *Transfer Learning*, which uses a pre-trained model to jump-start the training process. If your dataset's classes are similar to those in the pre-trained model, pre-labels may be available after only a few hundred manually labeled images. If your dataset is significantly different from the data used to pre-train the model, it may take much longer.
137143

138-
* **Clustering** - after a certain number of labels are submitted, the machine learning model starts to group together similar images. These similar images are presented to the labelers on the same screen to speed up manual tagging. Clustering is especially useful when the labeler is viewing a grid of 4, 6, or 9 images.
144+
Since the final labels still rely on input from the labeler, this technology is sometimes called *human in the loop* labeling.
139145

140-
* **Prelabeling** - after more image labels are submitted, a classification model is used to predict image tags. The labeler now sees pages that contain predicted labels already present on each image. The task is then to review these labels and correct any mis-labeled images before submitting the page.
146+
### Clustering
141147

142-
The exact number of labeled images necessary to start assisted labeling is not a fixed number. The actual value depends on the number of label classes defined in your project. Labeling service will start to train a model when there are enough labels and use the model to produce either a clustered or prelabeled task.
148+
After a certain number of labels are submitted, the machine learning model starts to group together similar images. These similar images are presented to the labelers on the same screen to speed up manual tagging. Clustering is especially useful when the labeler is viewing a grid of 4, 6, or 9 images.
143149

144-
Since the final labels still rely on input from the labeler, this technology is sometimes called *human in the loop* labeling.
150+
Once a machine learning model has been trained on your manually labeled data, the model is truncated to its last fully-connected layer. Unlabeled images are then passed through the truncated model in a process commonly known as "embedding" or "featurization." This embeds each image in a high-dimensional space defined by this model layer. Images which are nearest neighbors in the space are used for clustering tasks.
151+
152+
### Prelabeling
153+
154+
After more image labels are submitted, a classification model is used to predict image tags. The labeler now sees pages that contain predicted labels already present on each image. The task is then to review these labels and correct any mis-labeled images before submitting the page.
155+
156+
Once a machine learning model has been trained on your manually labeled data, the model is evaluated on a test set of manually labeled images to determine its accuracy at a variety of different confidence thresholds. This evaluation process is used to determine a confidence threshold above which the model is accurate enough to show pre-labels. The model is then evaluated against unlabeled data. Images with predictions more confident than this threshold are used for pre-labeling.
145157

146158
> [!NOTE]
147159
> ML assisted labeling is available **only** in Enterprise edition workspaces.

articles/machine-learning/how-to-label-images.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,9 @@ Machine learning algorithms may be triggered during a multi-class or multi-label
6060

6161
* After some amount of images have been labeled, you may see **Tasks clustered** at the top of your screen next to the project name. This means that images are grouped together to present similar images on the same page. If so, switch to one of the multiple image views to take advantage of the grouping.
6262

63-
* At a later point, you may see **Tasks prelabeled** next to the project name. Images will then appear with a suggested label that comes from a machine learning classification model. When you see these labels, correct any wrong labels before submitting the page.
63+
* At a later point, you may see **Tasks prelabeled** next to the project name. Images will then appear with a suggested label that comes from a machine learning classification model. No machine learning model has 100% accuracy. While we only use images for which the model is confident, these images might still be incorrectly prelabeled. When you see these labels, correct any wrong labels before submitting the page.
64+
65+
Especially early in a labeling project, the machine learning model may only be accurate enough to prelabel a small subset of images. Once these images are labeled, the labeling project will return to manual labeling to gather more data for the next round of model training. Over time, the model will become more confident about a higher proportion of images, resulting in more prelabel tasks later in the project.
6466

6567
## Tag images for multi-class classification
6668

0 commit comments

Comments
 (0)