Skip to content

Commit 1788e58

Browse files
authored
Merge pull request #199120 from aahill/fast-follow-updates
terminology, json, toc updates
2 parents 41d2058 + e7bf8de commit 1788e58

File tree

15 files changed

+162
-173
lines changed

15 files changed

+162
-173
lines changed

articles/cognitive-services/language-service/concepts/model-lifecycle.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ For asynchronous endpoints, use the `model-version` property in the request body
4444

4545
The model-version used in your API request will be included in the response object.
4646

47+
> [!NOTE]
48+
> If you are using an model version that is not listed in the table, then it was subjected to the expiration policy.
4749
4850
Use the table below to find which model versions are supported by each feature:
4951

articles/cognitive-services/language-service/custom-named-entity-recognition/concepts/data-formats.md

Lines changed: 63 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: language-service
1010
ms.topic: conceptual
11-
ms.date: 05/06/2022
11+
ms.date: 05/24/2022
1212
ms.author: aahi
1313
ms.custom: language-service-custom-ner, ignite-fall-2021, event-tier1-build-2022
1414
---
@@ -23,62 +23,79 @@ Your Labels file should be in the `json` format below to be used in [importing](
2323

2424
```json
2525
{
26+
"projectFileVersion": "2022-05-01",
27+
"stringIndexType": "Utf16CodeUnit",
28+
"metadata": {
29+
"projectKind": "CustomEntityRecognition",
30+
"storageInputContainerName": "{CONTAINER-NAME}",
31+
"projectName": "{PROJECT-NAME}",
32+
"multilingual": false,
33+
"description": "Project-description",
34+
"language": "en-us"
35+
},
36+
"assets": {
37+
"projectKind": "CustomEntityRecognition",
2638
"entities": [
27-
{
28-
"category": "Entity1"
29-
},
30-
{
31-
"category": "Entity2"
32-
}
39+
{
40+
"category": "Entity1"
41+
},
42+
{
43+
"category": "Entity2"
44+
}
3345
],
3446
"documents": [
35-
{
36-
"location": "{DOCUMENT-NAME}",
37-
"language": "{LANGUAGE-CODE}",
38-
"dataset": "{DATASET}",
39-
"entities": [
40-
{
41-
"regionOffset": 0,
42-
"regionLength": 500,
43-
"labels": [
44-
{
45-
"category": "Entity1",
46-
"offset": 25,
47-
"length": 10
48-
},
49-
{
50-
"category": "Entity2",
51-
"offset": 120,
52-
"length": 8
53-
}
54-
]
55-
}
47+
{
48+
"location": "{DOCUMENT-NAME}",
49+
"language": "{LANGUAGE-CODE}",
50+
"dataset": "{DATASET}",
51+
"entities": [
52+
{
53+
"regionOffset": 0,
54+
"regionLength": 500,
55+
"labels": [
56+
{
57+
"category": "Entity1",
58+
"offset": 25,
59+
"length": 10
60+
},
61+
{
62+
"category": "Entity2",
63+
"offset": 120,
64+
"length": 8
65+
}
5666
]
57-
},
58-
{
59-
"location": "{DOCUMENT-NAME}",
60-
"language": "{LANGUAGE-CODE}",
61-
"dataset": "{DATASET}",
62-
"entities": [
63-
{
64-
"regionOffset": 0,
65-
"regionLength": 100,
66-
"labels": [
67-
{
68-
"category": "Entity2",
69-
"offset": 20,
70-
"length": 5
71-
}
72-
]
73-
}
67+
}
68+
]
69+
},
70+
{
71+
"location": "{DOCUMENT-NAME}",
72+
"language": "{LANGUAGE-CODE}",
73+
"dataset": "{DATASET}",
74+
"entities": [
75+
{
76+
"regionOffset": 0,
77+
"regionLength": 100,
78+
"labels": [
79+
{
80+
"category": "Entity2",
81+
"offset": 20,
82+
"length": 5
83+
}
7484
]
75-
}
85+
}
86+
]
87+
}
7688
]
89+
}
7790
}
91+
7892
```
7993

8094
|Key |Placeholder |Value | Example |
8195
|---------|---------|----------|--|
96+
| `multilingual` | `true`| A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See [language support](../language-support.md#multi-lingual-option) to learn more about multilingual support. | `true`|
97+
|`projectName`|`{PROJECT-NAME}`|Project name|`myproject`|
98+
| storageInputContainerName|`{CONTAINER-NAME}`|Container name|`mycontainer`|
8299
| `entities` | | Array containing all the entity types you have in the project. These are the entity types that will be extracted from your documents into.| |
83100
| `documents` | | Array containing all the documents in your project and list of the entities labeled within each document. | [] |
84101
| `location` | `{DOCUMENT-NAME}` | The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name.|`doc1.txt`|

articles/cognitive-services/language-service/custom-named-entity-recognition/concepts/evaluation-metrics.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: language-service
1010
ms.topic: conceptual
11-
ms.date: 05/06/2022
11+
ms.date: 05/24/2022
1212
ms.author: aahi
1313
ms.custom: language-service-custom-ner, ignite-fall-2021, event-tier1-build-2022
1414
---
@@ -133,5 +133,5 @@ Similarly,
133133

134134
## Next steps
135135

136-
* [View a model's evaluation in Language Studio](../how-to/view-model-evaluation.md)
136+
* [View a model's performance in Language Studio](../how-to/view-model-evaluation.md)
137137
* [Train a model](../how-to/train-model.md)

articles/cognitive-services/language-service/custom-named-entity-recognition/glossary.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ For example, in the sentence "*John borrowed 25,000 USD from Fred.*" the entitie
3030
| Loan Amount | *25,000 USD* |
3131

3232
## F1 score
33-
The F1 score is a function of Precision and Recall. It's needed when you seek a balance between [precision](#precision) and [recall](#recall].
33+
The F1 score is a function of Precision and Recall. It's needed when you seek a balance between [precision](#precision) and [recall](#recall).
3434

3535
## Model
3636

articles/cognitive-services/language-service/custom-named-entity-recognition/how-to/call-api.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: language-service
1010
ms.topic: how-to
11-
ms.date: 05/09/2022
11+
ms.date: 05/24/2022
1212
ms.author: aahi
1313
ms.devlang: csharp, python
1414
ms.custom: language-service-custom-ner, event-tier1-build-2022
@@ -17,7 +17,7 @@ ms.custom: language-service-custom-ner, event-tier1-build-2022
1717
# Query deployment to extract entities
1818

1919
After the deployment is added successfully, you can query the deployment to extract entities from your text based on the model you assigned to the deployment.
20-
You can query the deployment programmatically using the [Prediction API](https://aka.ms/ct-runtime-swagger) or through the [Client libraries (Azure SDK)](#get-task-results).
20+
You can query the deployment programmatically using the [Prediction API](https://aka.ms/ct-runtime-api) or through the [Client libraries (Azure SDK)](#get-task-results).
2121

2222
## Test deployed model
2323

@@ -80,4 +80,7 @@ First you will need to get your resource key and endpoint:
8080

8181
## Next steps
8282

83-
* [Custom NER overview](../overview.md)
83+
* [Enrich a Cognitive Search index tutorial](../tutorials/cognitive-search.md)
84+
85+
86+

articles/cognitive-services/language-service/custom-named-entity-recognition/how-to/create-project.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ manager: nitinme
88
ms.service: cognitive-services
99
ms.subservice: language-service
1010
ms.topic: how-to
11-
ms.date: 05/06/2022
11+
ms.date: 05/24/2022
1212
ms.author: aahi
1313
ms.custom: language-service-custom-ner, references_regions, ignite-fall-2021, event-tier1-build-2022
1414
---
@@ -112,4 +112,4 @@ If you have already labeled data, you can use it to get started with the service
112112

113113
* You should have an idea of the [project schema](design-schema.md) you will use to label your data.
114114

115-
* After your project is created, you can start [tagging your data](tag-data.md), which will inform your entity extraction model how to interpret text, and is used for training and evaluation.
115+
* After your project is created, you can start [labeling your data](tag-data.md), which will inform your entity extraction model how to interpret text, and is used for training and evaluation.

articles/cognitive-services/language-service/custom-named-entity-recognition/how-to/design-schema.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,13 +29,13 @@ The schema defines the entity types/categories that you need your model to extra
2929

3030
* Avoid entity types ambiguity.
3131

32-
**Ambiguity** happens when entity types you select are similar to each other. The more ambiguous your schema the more tagged data you will need to differentiate between different entity types.
32+
**Ambiguity** happens when entity types you select are similar to each other. The more ambiguous your schema the more labeled data you will need to differentiate between different entity types.
3333

3434
For example, if you are extracting data from a legal contract, to extract "Name of first party" and "Name of second party" you will need to add more examples to overcome ambiguity since the names of both parties look similar. Avoid ambiguity as it saves time, effort, and yields better results.
3535

3636
* Avoid complex entities. Complex entities can be difficult to pick out precisely from text, consider breaking it down into multiple entities.
3737

38-
For example, extracting "Address" would be challenging if it's not broken down to smaller entities. There are so many variations of how addresses appear, it would take large number of tagged entities to teach the model to extract an address, as a whole, without breaking it down. However, if you replace "Address" with "Street Name", "PO Box", "City", "State" and "Zip", the model will require fewer tags per entity.
38+
For example, extracting "Address" would be challenging if it's not broken down to smaller entities. There are so many variations of how addresses appear, it would take large number of labeled entities to teach the model to extract an address, as a whole, without breaking it down. However, if you replace "Address" with "Street Name", "PO Box", "City", "State" and "Zip", the model will require fewer labels per entity.
3939

4040
## Data selection
4141

@@ -61,10 +61,14 @@ As a prerequisite for creating a project, your training data needs to be uploade
6161
* [Create and upload documents from Azure](../../../../storage/blobs/storage-quickstart-blobs-portal.md#create-a-container)
6262
* [Create and upload documents using Azure Storage Explorer](../../../../vs-azure-tools-storage-explorer-blobs.md)
6363

64-
You can only use `.txt` documents. If your data is in other format, you can use [CLUtils parse command](https://github.com/microsoft/CognitiveServicesLanguageUtilities/blob/main/CustomTextAnalytics.CLUtils/Solution/CogSLanguageUtilities.ViewLayer.CliCommands/Commands/ParseCommand/README.md) to change your file format.
64+
You can only use `.txt` documents. If your data is in other format, you can use [CLUtils parse command](https://github.com/microsoft/CognitiveServicesLanguageUtilities/blob/main/CustomTextAnalytics.CLUtils/Solution/CogSLanguageUtilities.ViewLayer.CliCommands/Commands/ParseCommand/README.md) to change your document format.
6565

66-
You can upload an annotated dataset, or you can upload an unannotated one and [tag your data](../how-to/tag-data.md) in Language studio.
66+
You can upload an annotated dataset, or you can upload an unannotated one and [label your data](../how-to/tag-data.md) in Language studio.
6767

68+
## Test set
69+
70+
When defining the testing set, make sure to include example documents that are not present in the training set. Defining the testing set is an important step to calculate the [model performance](view-model-evaluation.md#model-details). Also, make sure that the testing set include documents that represent all entities used in your project.
71+
6872
## Next steps
6973

7074
If you haven't already, create a custom NER project. If it's your first time using custom NER, consider following the [quickstart](../quickstart.md) to create an example project. You can also see the [how-to article](../how-to/create-project.md) for more details on what you need to create a project.

articles/cognitive-services/language-service/custom-named-entity-recognition/how-to/improve-model.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,13 +15,13 @@ ms.custom: language-service-custom-ner, ignite-fall-2021, event-tier1-build-2022
1515

1616
# Improve model performance
1717

18-
In some cases, the model is expected to extract entities that are inconsistent with your tagged ones. In this page you can observe these inconsistencies and decide on the needed changes needed to improve your model performance.
18+
In some cases, the model is expected to extract entities that are inconsistent with your labeled ones. In this page you can observe these inconsistencies and decide on the needed changes needed to improve your model performance.
1919

2020
## Prerequisites
2121

2222
* A successfully [created project](create-project.md) with a configured Azure blob storage account
2323
* Text data that [has been uploaded](design-schema.md#data-preparation) to your storage account.
24-
* [Tagged data](tag-data.md)
24+
* [Labeled data](tag-data.md)
2525
* A [successfully trained model](train-model.md)
2626
* Reviewed the [model evaluation details](view-model-evaluation.md) to determine how your model is performing.
2727
* Familiarized yourself with the [evaluation metrics](../concepts/evaluation-metrics.md).
@@ -31,7 +31,7 @@ See the [project development lifecycle](../overview.md#project-development-lifec
3131

3232
## Review test set predictions
3333

34-
After you have viewed your [model's evaluation](view-model-evaluation.md), you'll have formed an idea on your model performance. In this page, you can view how your model performs vs how it's expected to perform. You can view predicted and tagged entities side by side for each document in your test set. You can review entities that were extracted differently than they were originally tagged.
34+
After you have viewed your [model's evaluation](view-model-evaluation.md), you'll have formed an idea on your model performance. In this page, you can view how your model performs vs how it's expected to perform. You can view predicted and labeled entities side by side for each document in your test set. You can review entities that were extracted differently than they were originally labeled.
3535

3636

3737
To review inconsistent predictions in the [test set](train-model.md) from within the [Language Studio](https://aka.ms/LanguageStudio):
@@ -42,15 +42,15 @@ To review inconsistent predictions in the [test set](train-model.md) from within
4242

4343
3. For easier analysis, you can toggle **Show incorrect predictions only** to view entities that were incorrectly predicted only. You should see all documents that include entities that were incorrectly predicted.
4444

45-
5. You can expand each document to see more details about predicted and tagged entities.
45+
5. You can expand each document to see more details about predicted and labeled entities.
4646

4747
Use the following information to help guide model improvements.
4848

49-
* If entity `X` is constantly identified as entity `Y`, it means that there is ambiguity between these entity types and you need to reconsider your schema. Learn more about [data selection and schema design](design-schema.md#schema-design). Another solution is to consider tagging more instances of these entities, to help the model improve and differentiate between them.
49+
* If entity `X` is constantly identified as entity `Y`, it means that there is ambiguity between these entity types and you need to reconsider your schema. Learn more about [data selection and schema design](design-schema.md#schema-design). Another solution is to consider labeling more instances of these entities, to help the model improve and differentiate between them.
5050

5151
* If a complex entity is repeatedly not predicted, consider [breaking it down to simpler entities](design-schema.md#schema-design) for easier extraction.
5252

53-
* If an entity is predicted while it was not tagged in your data, this means to you need to review your tags. Be sure that all instances of an entity are properly tagged in all documents.
53+
* If an entity is predicted while it was not labeled in your data, this means to you need to review your labels. Be sure that all instances of an entity are properly labeled in all documents.
5454

5555

5656
:::image type="content" source="../media/review-predictions.png" alt-text="A screenshot showing model predictions in Language Studio." lightbox="../media/review-predictions.png":::

0 commit comments

Comments
 (0)