Skip to content

Commit fc7f6c7

Browse files
authored
Merge pull request #225419 from sdgilley/sdg-updates
data labeling updates
2 parents cca65ab + 56a131a commit fc7f6c7

File tree

4 files changed

+47
-17
lines changed

4 files changed

+47
-17
lines changed

articles/machine-learning/how-to-create-image-labeling-projects.md

Lines changed: 19 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ Image data can be files with any of these types: ".jpg", ".jpeg", ".png", ".jpe"
4242

4343
[!INCLUDE [start](../../includes/machine-learning-data-labeling-start.md)]
4444

45-
1. To create a project, select **Add project**. Give the project an appropriate name. The project name cannot be reused, even if the project is deleted in future.
45+
1. To create a project, select **Add project**. Give the project an appropriate name. The project name can't be reused, even if the project is deleted in future.
4646

4747
1. Select **Image** to create an image labeling project.
4848

@@ -117,8 +117,8 @@ For bounding boxes, important questions include:
117117
* What should the labelers do if the object is tiny? Should it be labeled as an object or should it be ignored as background?
118118
* How to label the object that is partially shown in the image?
119119
* How to label the object that partially covered by other object?
120-
* How to label the object if there is no clear boundary of the object?
121-
* How to label the object which is not object class of interest but visually similar to an interested object type?
120+
* How to label the object if there's no clear boundary of the object?
121+
* How to label the object that isn't the object class of interest but visually similar to an interested object type?
122122

123123
> [!NOTE]
124124
> Be sure to note that the labelers will be able to select the first 9 labels by using number keys 1-9.
@@ -132,7 +132,7 @@ For bounding boxes, important questions include:
132132
133133
## Use ML-assisted data labeling
134134

135-
The **ML-assisted labeling** page lets you trigger automatic machine learning models to accelerate labeling tasks. Medical images (".dcm") are not included in assisted labeling.
135+
The **ML-assisted labeling** page lets you trigger automatic machine learning models to accelerate labeling tasks. Medical images (".dcm") aren't included in assisted labeling.
136136

137137
At the beginning of your labeling project, the items are shuffled into a random order to reduce potential bias. However, any biases that are present in the dataset will be reflected in the trained model. For example, if 80% of your items are of a single class, then approximately 80% of the data used to train the model will be of that class.
138138

@@ -143,7 +143,7 @@ ML-assisted labeling consists of two phases:
143143
* Clustering
144144
* Prelabeling
145145

146-
The exact number of labeled data necessary to start assisted labeling is not a fixed number. This can vary significantly from one labeling project to another. For some projects, is sometimes possible to see prelabel or cluster tasks after 300 items have been manually labeled. ML Assisted Labeling uses a technique called *Transfer Learning*, which uses a pre-trained model to jump-start the training process. If your dataset's classes are similar to those in the pre-trained model, pre-labels may be available after only a few hundred manually labeled items. If your dataset is significantly different from the data used to pre-train the model, it may take much longer.
146+
The exact number of labeled data necessary to start assisted labeling isn't a fixed number. This number can vary significantly from one labeling project to another. For some projects, is sometimes possible to see prelabel or cluster tasks after 300 items have been manually labeled. ML Assisted Labeling uses a technique called *Transfer Learning*, which uses a pre-trained model to jump-start the training process. If your dataset's classes are similar to those in the pre-trained model, pre-labels may be available after only a few hundred manually labeled items. If your dataset is significantly different from the data used to pre-train the model, it may take much longer.
147147

148148
When you're using consensus labeling, the consensus label is used for training.
149149

@@ -154,11 +154,11 @@ Since the final labels still rely on input from the labeler, this technology is
154154
155155
### Clustering
156156

157-
After a certain number of labels are submitted, the machine learning model for classification starts to group together similar items. These similar images are presented to the labelers on the same screen to speed up manual tagging. Clustering is especially useful when the labeler is viewing a grid of 4, 6, or 9 images.
157+
After some labels are submitted, the machine learning model for classification starts to group together similar items. These similar images are presented to the labelers on the same screen to speed up manual tagging. Clustering is especially useful when the labeler is viewing a grid of 4, 6, or 9 images.
158158

159-
Once a machine learning model has been trained on your manually labeled data, the model is truncated to its last fully-connected layer. Unlabeled images are then passed through the truncated model in a process commonly known as "embedding" or "featurization." This embeds each image in a high-dimensional space defined by this model layer. Images that are nearest neighbors in the space are used for clustering tasks.
159+
Once a machine learning model has been trained on your manually labeled data, the model is truncated to its last fully connected layer. Unlabeled images are then passed through the truncated model in a process commonly known as "embedding" or "featurization." This process embeds each image in a high-dimensional space defined by this model layer. Images that are nearest neighbors in the space are used for clustering tasks.
160160

161-
The clustering phase does not appear for object detection models, or for text classification.
161+
The clustering phase doesn't appear for object detection models, or for text classification.
162162

163163
### Prelabeling
164164

@@ -180,9 +180,9 @@ The **Dashboard** tab shows the progress of the labeling task.
180180

181181
:::image type="content" source="./media/how-to-create-labeling-projects/labeling-dashboard.png" alt-text="Data labeling dashboard":::
182182

183-
The progress chart shows how many items have been labeled, skipped, in need of review, or not yet done. Hover over the chart to see the number of item in each section.
183+
The progress chart shows how many items have been labeled, skipped, in need of review, or not yet done. Hover over the chart to see the number of items in each section.
184184

185-
The middle section shows the queue of tasks yet to be assigned. When ML assisted labeling is off, this section shows the number of manual tasks to be assigned. When ML assisted labeling is on, this will also show:
185+
The middle section shows the queue of tasks yet to be assigned. When ML assisted labeling is off, this section shows the number of manual tasks to be assigned. When ML assisted labeling is on, this section will also show:
186186

187187
* Tasks containing clustered items in the queue
188188
* Tasks containing prelabeled items in the queue
@@ -208,7 +208,7 @@ If your project uses consensus labeling, you'll also want to review those images
208208

209209
:::image type="content" source="media/how-to-create-labeling-projects/select-filters.png" alt-text="Screenshot: select filters to review consensus label problems." lightbox="media/how-to-create-labeling-projects/select-filters.png":::
210210

211-
1. Under **Labeled datapoints**, select **Consensus labels in need of review**. This shows only those images where a consensus was not achieved among the labelers.
211+
1. Under **Labeled datapoints**, select **Consensus labels in need of review**. This shows only those images where a consensus wasn't achieved among the labelers.
212212

213213
:::image type="content" source="media/how-to-create-labeling-projects/select-need-review.png" alt-text="Screenshot: Select labels in need of review.":::
214214

@@ -227,7 +227,8 @@ View and change details of your project. In this tab you can:
227227
* View details of the storage container used to store labeled outputs in your project
228228
* Add labels to your project
229229
* Edit instructions you give to your labels
230-
* Edit details of ML assisted labeling, including enable/disable
230+
* Change settings for ML assisted labeling, and kick off a labeling task
231+
231232

232233
### Access for labelers
233234

@@ -237,19 +238,23 @@ View and change details of your project. In this tab you can:
237238

238239
[!INCLUDE [add-label](../../includes/machine-learning-data-labeling-add-label.md)]
239240

241+
## Start an ML assisted labeling task
242+
243+
[!INCLUDE [start-ml-assist](../../includes/machine-learning-data-labeling-start-ml-assist.md)]
244+
240245
## Export the labels
241246

242247
Use the **Export** button on the **Project details** page of your labeling project. You can export the label data for Machine Learning experimentation at any time.
243248

244249
* Image labels can be exported as:
245-
* [COCO format](http://cocodataset.org/#format-data).The COCO file is created in the default blob store of the Azure Machine Learning workspace in a folder within *Labeling/export/coco*.
250+
* [COCO format](http://cocodataset.org/#format-data). The COCO file is created in the default blob store of the Azure Machine Learning workspace in a folder within *Labeling/export/coco*.
246251
* An [Azure Machine Learning dataset with labels](v1/how-to-use-labeled-dataset.md).
247252

248253
Access exported Azure Machine Learning datasets in the **Datasets** section of Machine Learning. The dataset details page also provides sample code to access your labels from Python.
249254

250255
![Exported dataset](./media/how-to-create-labeling-projects/exported-dataset.png)
251256

252-
Once you have exported your labeled data to an Azure Machine Learning dataset, you can use AutoML to build computer vision models trained on your labeled data. Learn more at [Set up AutoML to train computer vision models with Python](how-to-auto-train-image-models.md)
257+
Once you've exported your labeled data to an Azure Machine Learning dataset, you can use AutoML to build computer vision models trained on your labeled data. Learn more at [Set up AutoML to train computer vision models with Python](how-to-auto-train-image-models.md)
253258

254259
## Troubleshooting
255260

articles/machine-learning/how-to-create-text-labeling-projects.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -114,6 +114,7 @@ To directly upload your data:
114114
> [!NOTE]
115115
> Incremental refresh is available for projects that use tabular (.csv or .tsv) dataset input. However, only new tabular files are added. Changes to existing tabular files will not be recognized from the refresh.
116116
117+
117118
## Specify label categories
118119

119120
[!INCLUDE [classes](../../includes/machine-learning-data-labeling-classes.md)]
@@ -142,7 +143,7 @@ To use **ML-assisted labeling**:
142143

143144
At the beginning of your labeling project, the items are shuffled into a random order to reduce potential bias. However, any biases that are present in the dataset will be reflected in the trained model. For example, if 80% of your items are of a single class, then approximately 80% of the data used to train the model will be of that class.
144145

145-
For training the text DNN model used by ML-assist, the input text per training example will be limited to approximately the first 128 words in the document. For tabular input, all text columns are first concatenated before applying this limit. This is a practical limit imposed to allow for the model training to complete in a timely manner. The actual text in a document (for file input) or set of text columns (for tabular input) can exceed 128 words. The limit only pertains to what is internally leveraged by the model during the training process.
146+
For training the text DNN model used by ML-assist, the input text per training example will be limited to approximately the first 128 words in the document. For tabular input, all text columns are first concatenated before applying this limit. This is a practical limit imposed to allow for the model training to complete in a timely manner. The actual text in a document (for file input) or set of text columns (for tabular input) can exceed 128 words. The limit only pertains to what is internally used by the model during the training process.
146147

147148
The exact number of labeled items necessary to start assisted labeling isn't a fixed number. This can vary significantly from one labeling project to another, depending on many factors, including the number of labels classes and label distribution.
148149

@@ -193,7 +194,7 @@ If your project uses consensus labeling, you'll also want to review those images
193194

194195
:::image type="content" source="media/how-to-create-text-labeling-projects/text-labeling-select-filter.png" alt-text="Screenshot: select filters to review consensus label problems." lightbox="media/how-to-create-text-labeling-projects/text-labeling-select-filter.png":::
195196

196-
1. Under **Labeled datapoints**, select **Consensus labels in need of review**. This shows only those images where a consensus was not achieved among the labelers.
197+
1. Under **Labeled datapoints**, select **Consensus labels in need of review**. This shows only those images where a consensus wasn't achieved among the labelers.
197198

198199
:::image type="content" source="media/how-to-create-labeling-projects/select-need-review.png" alt-text="Screenshot: Select labels in need of review.":::
199200

@@ -212,6 +213,7 @@ View and change details of your project. In this tab you can:
212213
* View details of the storage container used to store labeled outputs in your project
213214
* Add labels to your project
214215
* Edit instructions you give to your labels
216+
* Change settings for ML assisted labeling, and kick off a labeling task
215217

216218
### Access for labelers
217219

@@ -221,6 +223,10 @@ View and change details of your project. In this tab you can:
221223

222224
[!INCLUDE [add-label](../../includes/machine-learning-data-labeling-add-label.md)]
223225

226+
## Start an ML assisted labeling task
227+
228+
[!INCLUDE [start-ml-assist](../../includes/machine-learning-data-labeling-start-ml-assist.md)]
229+
224230
## Export the labels
225231

226232
Use the **Export** button on the **Project details** page of your labeling project. You can export the label data for Machine Learning experimentation at any time.

includes/machine-learning-data-labeling-refresh.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ Select **Enable incremental refresh at regular intervals** when you want your pr
1515
Unselect if you don't want new files in the datastore to automatically be added to your project.
1616

1717
> [!IMPORTANT]
18-
> Don't create a new version for the dataset you want to update. If you do, the updates will not be seen, as the data labeling project is pinned to the initial version. Instead, use [Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/) to modify your data in the appropriate folder in the blob storage.
18+
> Don't create a new version for the dataset you want to update. If you do, the updates will not be seen, as the data labeling project is pinned to the initial version. Instead, use [Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/) to modify your data in the appropriate folder in the blob storage.
19+
> Also, don't remove data. Removing data from the dataset your project uses will cause an error in the project.
1920
2021
After the project is created, use the [**Details**](#details-tab) tab to change **incremental refresh**, view the timestamp for the last refresh, and request an immediate refresh of data.
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
author: sgilley
3+
ms.service: machine-learning
4+
ms.topic: include
5+
ms.date: 01/27/2023
6+
ms.author: sdgilley
7+
---
8+
9+
ML assisted labeling starts automatically after some items have been labeled. This automatic threshold varies by project. However, you can manually start an ML assisted training run, as long as your project contains at least some labeled data.
10+
11+
> [!NOTE]
12+
> On-demand training is not available for projects created before December, 2022. Create a new project to use this feature.
13+
14+
Use the **Details** section to start a new ML assisted training run.
15+
16+
1. At the top of your project, select **Details**.
17+
1. On the side navigation for **Details**, select **ML assisted labeling**
18+
1. Scroll to the bottom if necessary and select **Start** for **On-demand training**

0 commit comments

Comments
 (0)