Skip to content

Commit 49a06ad

Browse files
authored
Merge pull request #199134 from aahill/fast-follow-updates-3
text classification wording and JSON edits
2 parents 94db294 + dd1bfc9 commit 49a06ad

File tree

11 files changed

+116
-75
lines changed

11 files changed

+116
-75
lines changed

articles/cognitive-services/language-service/custom-named-entity-recognition/overview.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,11 +13,11 @@ ms.author: aahi
1313
ms.custom: language-service-custom-ner, ignite-fall-2021, event-tier1-build-2022
1414
---
1515

16-
# What is custom named entity recognition (preview)?
16+
# What is custom named entity recognition?
1717

1818
Custom NER is one of the custom features offered by [Azure Cognitive Service for Language](../overview.md). It is a cloud-based API service that applies machine-learning intelligence to enable you to build custom models for custom named entity recognition tasks.
1919

20-
Custom NER enables users to build custom AI models to extract domain-specific entities from unstructured text, such as contracts or financial documents. By creating a Custom NER project, developers can iteratively tag data, train, evaluate, and improve model performance before making it available for consumption. The quality of the tagged data greatly impacts model performance. To simplify building and customizing your model, the service offers a custom web portal that can be accessed through the [Language studio](https://aka.ms/languageStudio). You can easily get started with the service by following the steps in this [quickstart](quickstart.md).
20+
Custom NER enables users to build custom AI models to extract domain-specific entities from unstructured text, such as contracts or financial documents. By creating a Custom NER project, developers can iteratively label data, train, evaluate, and improve model performance before making it available for consumption. The quality of the labeled data greatly impacts model performance. To simplify building and customizing your model, the service offers a custom web portal that can be accessed through the [Language studio](https://aka.ms/languageStudio). You can easily get started with the service by following the steps in this [quickstart](quickstart.md).
2121

2222
This documentation contains the following article types:
2323

@@ -49,12 +49,12 @@ Using custom NER typically involves several different steps.
4949

5050
1. **Define your schema**: Know your data and identify the [entities](glossary.md#entity) you want extracted. Avoid ambiguity.
5151

52-
2. **Tag your data**: Tagging data is a key factor in determining model performance. Tag precisely, consistently and completely.
53-
1. **Tag precisely**: Tag each entity to its right type always. Only include what you want extracted, avoid unnecessary data in your tag.
54-
2. **Tag consistently**: The same entity should have the same tag across all the files.
55-
3. **Tag completely**: Tag all the instances of the entity in all your files.
52+
2. **Label your data**: Labeling data is a key factor in determining model performance. Label precisely, consistently and completely.
53+
1. **Label precisely**: Label each entity to its right type always. Only include what you want extracted, avoid unnecessary data in your labels.
54+
2. **Label consistently**: The same entity should have the same label across all the files.
55+
3. **Label completely**: Label all the instances of the entity in all your files.
5656

57-
3. **Train model**: Your model starts learning from your tagged data.
57+
3. **Train model**: Your model starts learning from your labeled data.
5858

5959
4. **View the model evaluation details**: After training is completed, view the model's evaluation details and its performance.
6060

articles/cognitive-services/language-service/custom-named-entity-recognition/tutorials/cognitive-search.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -143,9 +143,9 @@ Generally after training a model you would review its [evaluation details](../ho
143143

144144
6. Get your custom NER project secrets
145145

146-
1. You’ll need your **project-name**, project names are case-sensitive.
146+
1. You will need your **project-name**, project names are case-sensitive. Project names can be found in **project settings** page.
147147

148-
2. You’ll also need the **deployment-name**.
148+
2. You will also need the **deployment-name**. Deployment names can be found in **Deploying a model** page.
149149

150150
### Run the indexer command
151151

articles/cognitive-services/language-service/custom-text-classification/concepts/data-formats.md

Lines changed: 80 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: language-service
1010
ms.topic: conceptual
11-
ms.date: 05/04/2022
11+
ms.date: 05/24/2022
1212
ms.author: aahi
1313
ms.custom: language-service-custom-classification, ignite-fall-2021, event-tier1-build-2022
1414
---
@@ -25,34 +25,49 @@ Your Labels file should be in the `json` format below. This will enable you to [
2525

2626
```json
2727
{
28-
"classes": [
28+
"projectFileVersion": "2022-05-01",
29+
"stringIndexType": "Utf16CodeUnit",
30+
"metadata": {
31+
"projectKind": "CustomMultiLabelClassification",
32+
"storageInputContainerName": "{CONTAINER-NAME}",
33+
"projectName": "{PROJECT-NAME}",
34+
"multilingual": false,
35+
"description": "Project-description",
36+
"language": "en-us"
37+
},
38+
"assets": {
39+
"projectKind": "CustomMultiLabelClassification",
40+
"classes": [
2941
{
30-
"category": "Class1"
42+
"category": "Class1"
3143
},
3244
{
33-
"category": "Class2"
45+
"category": "Class2"
3446
}
35-
],
36-
"documents": [
37-
{
38-
"location": "{DOCUMENT-NAME}",
39-
"language": "{LANGUAGE-CODE}",
40-
"dataset": "{DATASET}",
41-
"classes": [
42-
{
43-
"category": "Class1"
44-
},
45-
{
46-
"category": "Class2"
47-
}
48-
]
49-
}
50-
]
51-
}
47+
],
48+
"documents": [
49+
{
50+
"location": "{DOCUMENT-NAME}",
51+
"language": "{LANGUAGE-CODE}",
52+
"dataset": "{DATASET}",
53+
"classes": [
54+
{
55+
"category": "Class1"
56+
},
57+
{
58+
"category": "Class2"
59+
}
60+
]
61+
}
62+
]
63+
}
5264
```
5365

5466
|Key |Placeholder |Value | Example |
5567
|---------|---------|----------|--|
68+
| multilingual | `true`| A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See [language support](../language-support.md#multi-lingual-option) to learn more about multilingual support. | `true`|
69+
|projectName|`{PROJECT-NAME}`|Project name|myproject|
70+
| storageInputContainerName|`{CONTAINER-NAME}`|Container name|`mycontainer`|
5671
| classes | [] | Array containing all the classes you have in the project. These are the classes you want to classify your documents into.| [] |
5772
| documents | [] | Array containing all the documents in your project and the classes labeled for this document. | [] |
5873
| location | `{DOCUMENT-NAME}` | The location of the documents in the storage container. Since all the documents are in the root of the container, this value should be the document name.|`doc1.txt`|
@@ -63,36 +78,53 @@ Your Labels file should be in the `json` format below. This will enable you to [
6378

6479
```json
6580
{
66-
"classes": [
67-
{
68-
"category": "Class1"
69-
},
70-
{
71-
"category": "Class2"
72-
}
73-
],
74-
"documents": [
75-
{
76-
"location": "{DOCUMENT-NAME}",
77-
"language": "{LANGUAGE-CODE}",
78-
"dataset": "{DATASET}",
79-
"class": {
80-
"category": "Class2"
81-
}
82-
},
83-
{
84-
"location": "{DOCUMENT-NAME}",
85-
"language": "{LANGUAGE-CODE}",
86-
"dataset": "{DATASET}",
87-
"class": {
88-
"category": "Class1"
89-
}
90-
}
91-
]
92-
}
81+
82+
"projectFileVersion": "2022-05-01",
83+
"stringIndexType": "Utf16CodeUnit",
84+
"metadata": {
85+
"projectKind": "CustomSingleLabelClassification",
86+
"storageInputContainerName": "{CONTAINER-NAME}",
87+
"settings": {},
88+
"projectName": "{PROJECT-NAME}",
89+
"multilingual": false,
90+
"description": "Project-description",
91+
"language": "en-us"
92+
},
93+
"assets": {
94+
"projectKind": "CustomSingleLabelClassification",
95+
"classes": [
96+
{
97+
"category": "Class1"
98+
},
99+
{
100+
"category": "Class2"
101+
}
102+
],
103+
"documents": [
104+
{
105+
"location": "{DOCUMENT-NAME}",
106+
"language": "{LANGUAGE-CODE}",
107+
"dataset": "{DATASET}",
108+
"class": {
109+
"category": "Class2"
110+
}
111+
},
112+
{
113+
"location": "{DOCUMENT-NAME}",
114+
"language": "{LANGUAGE-CODE}",
115+
"dataset": "{DATASET}",
116+
"class": {
117+
"category": "Class1"
118+
}
119+
}
120+
]
121+
}
93122
```
94123
|Key |Placeholder |Value | Example |
95124
|---------|---------|----------|--|
125+
|projectName|`{PROJECT-NAME}`|Project name|myproject|
126+
| storageInputContainerName|`{CONTAINER-NAME}`|Container name|`mycontainer`|
127+
| multilingual | `true`| A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See [language support](../language-support.md#multi-lingual-option) to learn more about multilingual support. | `true`|
96128
| classes | [] | Array containing all the classes you have in the project. These are the classes you want to classify your documents into.| [] |
97129
| documents | [] | Array containing all the documents in your project and which class this document belongs to. | [] |
98130
| location | `{DOCUMENT-NAME}` | The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name.|`doc1.txt`|

articles/cognitive-services/language-service/custom-text-classification/concepts/evaluation-metrics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -139,5 +139,5 @@ Similarly,
139139

140140
## Next steps
141141

142-
* [View a model's evaluation in Language Studio](../how-to/view-model-evaluation.md)
142+
* [View a model's performance in Language Studio](../how-to/view-model-evaluation.md)
143143
* [Train a model](../how-to/train-model.md)

articles/cognitive-services/language-service/custom-text-classification/how-to/call-api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.custom: language-service-clu, ignite-fall-2021, event-tier1-build-2022
1717
# Query deployment to classify text
1818

1919
After the deployment is added successfully, you can query the deployment to classify text based on the model you assigned to the deployment.
20-
You can query the deployment programmatically [Prediction API](https://aka.ms/ct-runtime-swagger) or through the [client libraries (Azure SDK)](#get-task-results).
20+
You can query the deployment programmatically [Prediction API](https://aka.ms/ct-runtime-api) or through the [client libraries (Azure SDK)](#get-task-results).
2121

2222
## Test deployed model
2323

articles/cognitive-services/language-service/custom-text-classification/how-to/create-project.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ Before you start using custom text classification, you will need:
2727

2828
Before you start using custom text classification, you will need an Azure Language resource. It is recommended to create your Language resource and connect a storage account to it in the Azure portal. Creating a resource in the Azure portal lets you create an Azure storage account at the same time, with all of the required permissions pre-configured. You can also read further in the article to learn how to use a pre-existing resource, and configure it to work with custom text classification.
2929

30-
You also will need an Azure storage account where you will upload your `.txt` files that will be used to train a model to classify text.
30+
You also will need an Azure storage account where you will upload your `.txt` documents that will be used to train a model to classify text.
3131

3232
> [!NOTE]
3333
> * You need to have an **owner** role assigned on the resource group to create a Language resource.

articles/cognitive-services/language-service/custom-text-classification/how-to/design-schema.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ The schema defines the classes that you need your model to classify your text in
2525

2626
For example, if you are classifying support tickets, you might need the following classes: *login issue*, *hardware issue*, *connectivity issue*, and *new equipment request*.
2727

28-
* **Avoid ambiguity in classes**: Ambiguity arises when the classes you specify share similar meaning to one another. The more ambiguous your schema is, the more tagged data you may need to differentiate between different classes.
28+
* **Avoid ambiguity in classes**: Ambiguity arises when the classes you specify share similar meaning to one another. The more ambiguous your schema is, the more labeled data you may need to differentiate between different classes.
2929

30-
For example, if you are classifying food recipes, they may be similar to an extent. To differentiate between *dessert recipe* and *main dish recipe*, you may need to tag more examples to help your model distinguish between the two classes. Avoiding ambiguity saves time and yields better results.
30+
For example, if you are classifying food recipes, they may be similar to an extent. To differentiate between *dessert recipe* and *main dish recipe*, you may need to label more examples to help your model distinguish between the two classes. Avoiding ambiguity saves time and yields better results.
3131

32-
* **Out of scope data**: When using your model in production, consider adding an *out of scope* class to your schema if you expect documents that don't belong to any of your classes. Then add a few documents to your dataset to be tagged as *out of scope*. The model can learn to recognize irrelevant documents, and predict their tags accordingly.
32+
* **Out of scope data**: When using your model in production, consider adding an *out of scope* class to your schema if you expect documents that don't belong to any of your classes. Then add a few documents to your dataset to be labeled as *out of scope*. The model can learn to recognize irrelevant documents, and predict their labels accordingly.
3333

3434

3535
## Data selection
@@ -58,8 +58,11 @@ As a prerequisite for creating a custom text classification project, your traini
5858

5959
You can only use `.txt`. documents for custom text. If your data is in other format, you can use [CLUtils parse command](https://github.com/microsoft/CognitiveServicesLanguageUtilities/blob/main/CustomTextAnalytics.CLUtils/Solution/CogSLanguageUtilities.ViewLayer.CliCommands/Commands/ParseCommand/README.md) to change your file format.
6060

61-
You can upload an annotated dataset, or you can upload an unannotated one and [tag your data](../how-to/tag-data.md) in Language studio.
61+
You can upload an annotated dataset, or you can upload an unannotated one and [label your data](../how-to/tag-data.md) in Language studio.
6262

63+
## Test set
64+
65+
When defining the testing set, make sure to include example documents that are not present in the training set. Defining the testing set is an important step to calculate the [model performance](view-model-evaluation.md#model-details). Also, make sure that the testing set include documents that represent all classes used in your project.
6366

6467
## Next steps
6568

articles/cognitive-services/language-service/custom-text-classification/how-to/improve-model.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ ms.custom: language-service-custom-classification, ignite-fall-2021, event-tier1
1515

1616
# Improve custom text classification model performance
1717

18-
In some cases, the model is expected to make predictions that are inconsistent with your tagged classes. Use this article to learn how to observe these inconsistencies and decide on the needed changes needed to improve your model performance.
18+
In some cases, the model is expected to make predictions that are inconsistent with your labeled classes. Use this article to learn how to observe these inconsistencies and decide on the needed changes needed to improve your model performance.
1919

2020

2121
## Prerequisites
@@ -24,15 +24,15 @@ To optionally improve a model, you'll need to have:
2424

2525
* [A custom text classification project](create-project.md) with a configured Azure blob storage account,
2626
* Text data that has [been uploaded](design-schema.md#data-preparation) to your storage account.
27-
* [Tagged data](tag-data.md) to successfully [train a model](train-model.md).
27+
* [Labeled data](tag-data.md) to successfully [train a model](train-model.md).
2828
* Reviewed the [model evaluation details](view-model-evaluation.md) to determine how your model is performing.
2929
* Familiarized yourself with the [evaluation metrics](../concepts/evaluation-metrics.md).
3030

3131
See the [project development lifecycle](../overview.md#project-development-lifecycle) for more information.
3232

3333
## Review test set predictions
3434

35-
After you have viewed your [model's evaluation](view-model-evaluation.md), you'll have formed an idea on your model performance. In this page, you can view how your model performs vs how it's expected to perform. You can view predicted and tagged classes side by side for each document in your test set. You can review documents that were predicted differently than they were originally tagged.
35+
After you have viewed your [model's evaluation](view-model-evaluation.md), you'll have formed an idea on your model performance. In this page, you can view how your model performs vs how it's expected to perform. You can view predicted and labeled classes side by side for each document in your test set. You can review documents that were predicted differently than they were originally labeled.
3636

3737

3838
To review inconsistent predictions in the [test set](train-model.md#data-splitting) from within the [Language Studio](https://aka.ms/LanguageStudio):
@@ -45,7 +45,11 @@ To review inconsistent predictions in the [test set](train-model.md#data-splitti
4545

4646
Use the following information to help guide model improvements.
4747

48-
* If a file that should belong to class `X` is constantly classified as class `Y`, it means that there is ambiguity between these classes and you need to reconsider your schema. Learn more about [data selection and schema design](design-schema.md#schema-design). Another solution is to consider adding more data to these classes, to help the model improve and differentiate between them.
48+
* If a file that should belong to class `X` is constantly classified as class `Y`, it means that there is ambiguity between these classes and you need to reconsider your schema. Learn more about [data selection and schema design](design-schema.md#schema-design).
49+
50+
* Another solution is to consider adding more data to these classes, to help the model improve and differentiate between them.
51+
52+
* Consider adding more data, to help the model differentiate between different classes.
4953

5054
:::image type="content" source="../media/review-validation-set.png" alt-text="A screenshot showing model predictions in Language Studio." lightbox="../media/review-validation-set.png":::
5155

articles/cognitive-services/language-service/custom-text-classification/how-to/tag-data.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Use the following steps to label your data:
5858

5959
4. In the right side pane, **Add class** to your project so you can start labeling your data with them.
6060

61-
:::image type="content" source="../media/tag-1.png" alt-text="A screenshot showing the data tagging screen" lightbox="../media/tag-1.png":::
61+
:::image type="content" source="../media/tag-1.png" alt-text="A screenshot showing the data labeling screen" lightbox="../media/tag-1.png":::
6262

6363
5. Start labeling your files.
6464

@@ -83,7 +83,9 @@ Use the following steps to label your data:
8383
> [!TIP]
8484
> If you are planning on using **Automatic** data spliting use the default option of assigning all the documents into your training set.
8585
86-
8. Under the **Distribution** pivot you can view the distribution of your labeled documents across training and testing sets. You can learn more about the training testing sets and how they are used [here](train-model.md#data-splitting).
86+
8. Under the **Distribution** pivot you can view the distribution across training and testing sets. You have two options for viewing:
87+
* *Total instances* where you can view count of all labeled instances of a specific class.
88+
* *documents with at least one label* where each document is counted if it contains at least one labeled instance of this class.
8789

8890
9. While you're labeling, your changes will be synced periodically, if they have not been saved yet you will find a warning at the top of your page. If you want to save manually, click on **Save labels** button at the bottom of the page.
8991

0 commit comments

Comments
 (0)