|
| 1 | +--- |
| 2 | +title: How to use autolabeling in custom named entity recognition |
| 3 | +titleSuffix: Azure Cognitive Services |
| 4 | +description: Learn how to use autolabeling in custom named entity recognition. |
| 5 | +services: cognitive-services |
| 6 | +author: aahill |
| 7 | +manager: nitinme |
| 8 | +ms.service: cognitive-services |
| 9 | +ms.subservice: language-service |
| 10 | +ms.custom: event-tier1-build-2022 |
| 11 | +ms.topic: how-to |
| 12 | +ms.date: 03/20/2023 |
| 13 | +ms.author: aahi |
| 14 | +--- |
| 15 | + |
| 16 | +# How to use autolabeling for Custom Named Entity Recognition |
| 17 | + |
| 18 | +[Labeling process](tag-data.md) is an important part of preparing your dataset. Since this process requires both time and effort, you can use the autolabeling feature to automatically label your entities. You can start autolabeling jobs based on a model you've previously trained or using GPT models. With autolabeling based on a model you've previously trained, you can start labeling a few of your documents, train a model, then create an autolabeling job to produce entity labels for other documents based on that model. With autolabeling with GPT, you may immediately trigger an autolabeling job without any prior model training. This feature can save you the time and effort of manually labeling your entities. |
| 19 | + |
| 20 | +## Prerequisites |
| 21 | + |
| 22 | +### [Autolabel based on a model you've trained](#tab/autolabel-model) |
| 23 | + |
| 24 | +Before you can use autolabeling based on a model you've trained, you need: |
| 25 | +* A successfully [created project](create-project.md) with a configured Azure blob storage account. |
| 26 | +* Text data that [has been uploaded](design-schema.md#data-preparation) to your storage account. |
| 27 | +* [Labeled data](tag-data.md) |
| 28 | +* A [successfully trained model](train-model.md) |
| 29 | + |
| 30 | + |
| 31 | +### [Autolabel with GPT](#tab/autolabel-gpt) |
| 32 | +Before you can use autolabeling with GPT, you need: |
| 33 | +* A successfully [created project](create-project.md) with a configured Azure blob storage account. |
| 34 | +* Text data that [has been uploaded](design-schema.md#data-preparation) to your storage account. |
| 35 | +* Entity names that are meaningful. The GPT models label entities in your documents based on the name of the entity you've provided. |
| 36 | +* [Labeled data](tag-data.md) isn't required. |
| 37 | +* An Azure OpenAI [resource and deployment](../../../openai/how-to/create-resource.md). |
| 38 | + |
| 39 | +--- |
| 40 | + |
| 41 | +## Trigger an autolabeling job |
| 42 | + |
| 43 | +### [Autolabel based on a model you've trained](#tab/autolabel-model) |
| 44 | + |
| 45 | +When you trigger an autolabeling job based on a model you've trained, there's a monthly limit of 5,000 text records per month, per resource. This means the same limit applies on all projects within the same resource. |
| 46 | + |
| 47 | +> [!TIP] |
| 48 | +> A text record is calculated as the ceiling of (Number of characters in a document / 1,000). For example, if a document has 8921 characters, the number of text records is: |
| 49 | +> |
| 50 | +> `ceil(8921/1000) = ceil(8.921)`, which is 9 text records. |
| 51 | +
|
| 52 | +1. From the left navigation menu, select **Data labeling**. |
| 53 | +2. Select the **Autolabel** button under the Activity pane to the right of the page. |
| 54 | + |
| 55 | + |
| 56 | + :::image type="content" source="../media/trigger-autotag.png" alt-text="A screenshot showing how to trigger an autotag job." lightbox="../media/trigger-autotag.png"::: |
| 57 | + |
| 58 | +3. Choose Autolabel based on a model you've trained and click on Next. |
| 59 | + |
| 60 | + :::image type="content" source="../media/choose-models.png" alt-text="A screenshot showing model choice for auto labeling." lightbox="../media/choose-models.png"::: |
| 61 | + |
| 62 | +4. Choose a trained model. It's recommended to check the model performance before using it for autolabeling. |
| 63 | + |
| 64 | + :::image type="content" source="../media/choose-model-trained.png" alt-text="A screenshot showing how to choose trained model for autotagging." lightbox="../media/choose-model-trained.png"::: |
| 65 | + |
| 66 | +5. Choose the entities you want to be included in the autolabeling job. By default, all entities are selected. You can see the total labels, precision and recall of each entity. It's recommended to include entities that perform well to ensure the quality of the automatically labeled entities. |
| 67 | + |
| 68 | + :::image type="content" source="../media/choose-entities.png" alt-text="A screenshot showing which entities to be included in autotag job." lightbox="../media/choose-entities.png"::: |
| 69 | + |
| 70 | +6. Choose the documents you want to be automatically labeled. The number of text records of each document is displayed. When you select one or more documents, you should see the number of texts records selected. It's recommended to choose the unlabeled documents from the filter. |
| 71 | + |
| 72 | + > [!NOTE] |
| 73 | + > * If an entity was automatically labeled, but has a user defined label, only the user defined label is used and visible. |
| 74 | + > * You can view the documents by clicking on the document name. |
| 75 | + |
| 76 | + :::image type="content" source="../media/choose-files.png" alt-text="A screenshot showing which documents to be included in the autotag job." lightbox="../media/choose-files.png"::: |
| 77 | + |
| 78 | +7. Select **Autolabel** to trigger the autolabeling job. |
| 79 | +You should see the model used, number of documents included in the autolabeling job, number of text records and entities to be automatically labeled. Autolabeling jobs can take anywhere from a few seconds to a few minutes, depending on the number of documents you included. |
| 80 | + |
| 81 | + :::image type="content" source="../media/review-autotag.png" alt-text="A screenshot showing the review screen for an autotag job." lightbox="../media/review-autotag.png"::: |
| 82 | + |
| 83 | +### [Autolabel with GPT](#tab/autolabel-gpt) |
| 84 | + |
| 85 | +When you trigger an autolabeling job with GPT, you're charged to your Azure OpenAI resource as per your consumption. You're charged an estimate of the number of tokens in each document being autolabeled. Refer to the [Azure OpenAI pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for a detailed breakdown of pricing per token of different models. |
| 86 | + |
| 87 | +1. From the left navigation menu, select **Data labeling**. |
| 88 | +2. Select the **Autolabel** button under the Activity pane to the right of the page. |
| 89 | + |
| 90 | + :::image type="content" source="../media/trigger-autotag.png" alt-text="A screenshot showing how to trigger an autotag job from the activity pane." lightbox="../media/trigger-autotag.png"::: |
| 91 | + |
| 92 | +4. Choose Autolabel with GPT and click on Next. |
| 93 | + |
| 94 | + :::image type="content" source="../media/choose-models.png" alt-text="A screenshot showing model choice for auto labeling." lightbox="../media/choose-models.png"::: |
| 95 | + |
| 96 | +5. Choose your Azure OpenAI resource and deployment. You must [create an Azure OpenAI resource and deploy a model](../../../openai/how-to/create-resource.md) in order to proceed. |
| 97 | + |
| 98 | + :::image type="content" source="../media/autotag-choose-open-ai.png" alt-text="A screenshot showing how to choose OpenAI resource and deployments" lightbox="../media/autotag-choose-open-ai.png"::: |
| 99 | + |
| 100 | +6. Choose the entities you want to be included in the autolabeling job. By default, all entities are selected. Having descriptive names for labels, and including examples for each label is recommended to achieve good quality labeling with GPT. |
| 101 | + |
| 102 | + :::image type="content" source="../media/choose-entities.png" alt-text="A screenshot showing which entities to be included in autotag job." lightbox="../media/choose-entities.png"::: |
| 103 | + |
| 104 | +7. Choose the documents you want to be automatically labeled. It's recommended to choose the unlabeled documents from the filter. |
| 105 | + |
| 106 | + > [!NOTE] |
| 107 | + > * If an entity was automatically labeled, but has a user defined label, only the user defined label is used and visible. |
| 108 | + > * You can view the documents by clicking on the document name. |
| 109 | + |
| 110 | + :::image type="content" source="../media/choose-files.png" alt-text="A screenshot showing which documents to be included in the autotag job." lightbox="../media/choose-files.png"::: |
| 111 | + |
| 112 | +8. Select **Start job** to trigger the autolabeling job. |
| 113 | +You should be directed to the autolabeling page displaying the autolabeling jobs initiated. Autolabeling jobs can take anywhere from a few seconds to a few minutes, depending on the number of documents you included. |
| 114 | + |
| 115 | + :::image type="content" source="../media/review-autotag.png" alt-text="A screenshot showing the review screen for an autotag job." lightbox="../media/review-autotag.png"::: |
| 116 | + |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +## Review the auto labeled documents |
| 121 | + |
| 122 | +When the autolabeling job is complete, you can see the output documents in the **Data labeling** page of Language Studio. Select **Review documents with autolabels** to view the documents with the **Auto labeled** filter applied. |
| 123 | + |
| 124 | +:::image type="content" source="../media/open-autotag-files.png" alt-text="A screenshot showing the autolabeled documents" lightbox="../media/open-autotag-files.png"::: |
| 125 | + |
| 126 | +Entities that have been automatically labeled appear with a dotted line. These entities have two selectors (a checkmark and an "X") that allow you to accept or reject the automatic label. |
| 127 | + |
| 128 | +Once an entity is accepted, the dotted line changes to a solid one, and the label is included in any further model training becoming a user defined label. |
| 129 | + |
| 130 | +Alternatively, you can accept or reject all automatically labeled entities within the document, using **Accept all** or **Reject all** in the top right corner of the screen. |
| 131 | + |
| 132 | +After you accept or reject the labeled entities, select **Save labels** to apply the changes. |
| 133 | + |
| 134 | +> [!NOTE] |
| 135 | +> * We recommend validating automatically labeled entities before accepting them. |
| 136 | +> * All labels that were not accepted are be deleted when you train your model. |
| 137 | +
|
| 138 | +:::image type="content" source="../media/accept-reject-entities.png" alt-text="A screenshot showing how to accept and reject autolabeled entities." lightbox="../media/accept-reject-entities.png"::: |
| 139 | + |
| 140 | +## Next steps |
| 141 | + |
| 142 | +* Learn more about [labeling your data](tag-data.md). |
0 commit comments