Merge pull request #199134 from aahill/fast-follow-updates-3

Court72 · web-flow · commit 49a06ad6dc48 · 2022-05-24T14:57:52.000-06:00
text classification wording and JSON edits
diff --git a/articles/cognitive-services/language-service/custom-named-entity-recognition/overview.md b/articles/cognitive-services/language-service/custom-named-entity-recognition/overview.md
@@ -13,11 +13,11 @@ ms.author: aahi
 ms.custom: language-service-custom-ner, ignite-fall-2021, event-tier1-build-2022
 ---
 
-# What is custom named entity recognition (preview)?
+# What is custom named entity recognition?
 
 Custom NER is one of the custom features offered by [Azure Cognitive Service for Language](../overview.md). It is a cloud-based API service that applies machine-learning intelligence to enable you to build custom models for custom named entity recognition tasks.
 
-Custom NER enables users to build custom AI models to extract domain-specific entities from unstructured text, such as contracts or financial documents. By creating a Custom NER project, developers can iteratively tag data, train, evaluate, and improve model performance before making it available for consumption. The quality of the tagged data greatly impacts model performance. To simplify building and customizing your model, the service offers a custom web portal that can be accessed through the [Language studio](https://aka.ms/languageStudio). You can easily get started with the service by following the steps in this [quickstart](quickstart.md). 
+Custom NER enables users to build custom AI models to extract domain-specific entities from unstructured text, such as contracts or financial documents. By creating a Custom NER project, developers can iteratively label data, train, evaluate, and improve model performance before making it available for consumption. The quality of the labeled data greatly impacts model performance. To simplify building and customizing your model, the service offers a custom web portal that can be accessed through the [Language studio](https://aka.ms/languageStudio). You can easily get started with the service by following the steps in this [quickstart](quickstart.md). 
  
 This documentation contains the following article types:
 
@@ -49,12 +49,12 @@ Using custom NER typically involves several different steps.
 
 1. **Define your schema**: Know your data and identify the [entities](glossary.md#entity) you want extracted. Avoid ambiguity.
 
-2. **Tag your data**: Tagging data is a key factor in determining model performance. Tag precisely, consistently and completely.
-    1. **Tag precisely**: Tag each entity to its right type always. Only include what you want extracted, avoid unnecessary data in your tag.
-    2. **Tag consistently**:  The same entity should have the same tag across all the files.
-    3. **Tag completely**: Tag all the instances of the entity in all your files.
+2. **Label your data**: Labeling data is a key factor in determining model performance. Label precisely, consistently and completely.
+    1. **Label precisely**: Label each entity to its right type always. Only include what you want extracted, avoid unnecessary data in your labels.
+    2. **Label consistently**:  The same entity should have the same label across all the files.
+    3. **Label completely**: Label all the instances of the entity in all your files.
 
-3. **Train model**: Your model starts learning from your tagged data.
+3. **Train model**: Your model starts learning from your labeled data.
 
 4. **View the model evaluation details**: After training is completed, view the model's evaluation details and its performance.
 
diff --git a/articles/cognitive-services/language-service/custom-named-entity-recognition/tutorials/cognitive-search.md b/articles/cognitive-services/language-service/custom-named-entity-recognition/tutorials/cognitive-search.md
@@ -143,9 +143,9 @@ Generally after training a model you would review its [evaluation details](../ho
 
 6. Get your custom NER project secrets
 
-    1. You’ll need your **project-name**, project names are case-sensitive.
+    1. You will need your **project-name**, project names are case-sensitive. Project names can be found in **project settings** page.
 
-    2. You’ll also need the **deployment-name**. 
+    2. You will also need the **deployment-name**. Deployment names can be found in **Deploying a model** page.
 
 ### Run the indexer command
 
diff --git a/articles/cognitive-services/language-service/custom-text-classification/concepts/data-formats.md b/articles/cognitive-services/language-service/custom-text-classification/concepts/data-formats.md
@@ -8,7 +8,7 @@ manager: nitinme
 ms.service: cognitive-services
 ms.subservice: language-service
 ms.topic: conceptual
-ms.date: 05/04/2022
+ms.date: 05/24/2022
 ms.author: aahi
 ms.custom: language-service-custom-classification, ignite-fall-2021, event-tier1-build-2022
 ---
@@ -25,34 +25,49 @@ Your Labels file should be in the `json` format below. This will enable you to [
 
 ```json
 {
-    "classes": [
+    "projectFileVersion": "2022-05-01",
+    "stringIndexType": "Utf16CodeUnit",
+    "metadata": {
+      "projectKind": "CustomMultiLabelClassification",
+      "storageInputContainerName": "{CONTAINER-NAME}",
+      "projectName": "{PROJECT-NAME}",
+      "multilingual": false,
+      "description": "Project-description",
+      "language": "en-us"
+    },
+    "assets": {
+      "projectKind": "CustomMultiLabelClassification",
+      "classes": [
         {
-            "category": "Class1"
+          "category": "Class1"
         },
         {
-            "category": "Class2"
+          "category": "Class2"
         }
-    ],
-    "documents": [
-        {
-            "location": "{DOCUMENT-NAME}",
-            "language": "{LANGUAGE-CODE}",
-            "dataset": "{DATASET}",
-            "classes": [
-                {
-                    "category": "Class1"
-                },
-                {
-                    "category": "Class2"
-                }
-            ]
-        }
-    ]
-}
+      ],
+      "documents": [
+          {
+              "location": "{DOCUMENT-NAME}",
+              "language": "{LANGUAGE-CODE}",
+              "dataset": "{DATASET}",
+              "classes": [
+                  {
+                      "category": "Class1"
+                  },
+                  {
+                      "category": "Class2"
+                  }
+              ]
+          }
+      ]
+  }
 ```
 
 |Key  |Placeholder  |Value  | Example |
 |---------|---------|----------|--|
+| multilingual | `true`| A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See [language support](../language-support.md#multi-lingual-option) to learn more about multilingual support. | `true`|
+|projectName|`{PROJECT-NAME}`|Project name|myproject|
+| storageInputContainerName|`{CONTAINER-NAME}`|Container name|`mycontainer`|
 | classes | [] | Array containing all the classes you have in the project. These are the classes you want to classify your documents into.| [] |
 | documents | [] | Array containing all the documents in your project and the classes labeled for this document. | [] |
 | location | `{DOCUMENT-NAME}` |  The location of the documents in the storage container. Since all the documents are in the root of the container, this value should be the document name.|`doc1.txt`|
@@ -63,36 +78,53 @@ Your Labels file should be in the `json` format below. This will enable you to [
 
 ```json
 {
-    "classes": [
-        {
-            "category": "Class1"
-        },
-        {
-            "category": "Class2"
-        }
-    ],
-    "documents": [
-        {
-            "location": "{DOCUMENT-NAME}",
-            "language": "{LANGUAGE-CODE}",
-            "dataset": "{DATASET}",
-            "class": {
-                "category": "Class2"
-            }
-        },
-        {
-            "location": "{DOCUMENT-NAME}",
-            "language": "{LANGUAGE-CODE}",
-            "dataset": "{DATASET}",
-            "class": {
-                "category": "Class1"
-            }
-        }
-    ]
-}
+    
+    "projectFileVersion": "2022-05-01",
+    "stringIndexType": "Utf16CodeUnit",
+    "metadata": {
+      "projectKind": "CustomSingleLabelClassification",
+      "storageInputContainerName": "{CONTAINER-NAME}",
+      "settings": {},
+      "projectName": "{PROJECT-NAME}",
+      "multilingual": false,
+      "description": "Project-description",
+      "language": "en-us"
+    },
+    "assets": {
+      "projectKind": "CustomSingleLabelClassification",
+      "classes": [
+          {
+              "category": "Class1"
+          },
+          {
+              "category": "Class2"
+          }
+      ],
+      "documents": [
+          {
+              "location": "{DOCUMENT-NAME}",
+              "language": "{LANGUAGE-CODE}",
+              "dataset": "{DATASET}",
+              "class": {
+                  "category": "Class2"
+              }
+          },
+          {
+              "location": "{DOCUMENT-NAME}",
+              "language": "{LANGUAGE-CODE}",
+              "dataset": "{DATASET}",
+              "class": {
+                  "category": "Class1"
+              }
+          }
+      ]
+  }
 ```
 |Key  |Placeholder  |Value  | Example |
 |---------|---------|----------|--|
+|projectName|`{PROJECT-NAME}`|Project name|myproject|
+| storageInputContainerName|`{CONTAINER-NAME}`|Container name|`mycontainer`|
+| multilingual | `true`| A boolean value that enables you to have documents in multiple languages in your dataset and when your model is deployed you can query the model in any supported language (not necessarily included in your training documents). See [language support](../language-support.md#multi-lingual-option) to learn more about multilingual support. | `true`|
 | classes | [] | Array containing all the classes you have in the project. These are the classes you want to classify your documents into.| [] |
 | documents | [] | Array containing all the documents in your project and which class this document belongs to. | [] |
 | location | `{DOCUMENT-NAME}` |  The location of the documents in the storage container. Since all the documents are in the root of the container this should be the document name.|`doc1.txt`|
diff --git a/articles/cognitive-services/language-service/custom-text-classification/concepts/evaluation-metrics.md b/articles/cognitive-services/language-service/custom-text-classification/concepts/evaluation-metrics.md
@@ -139,5 +139,5 @@ Similarly,
 
 ## Next steps
 
-* [View a model's evaluation in Language Studio](../how-to/view-model-evaluation.md)
+* [View a model's performance in Language Studio](../how-to/view-model-evaluation.md)
 * [Train a model](../how-to/train-model.md)
diff --git a/articles/cognitive-services/language-service/custom-text-classification/how-to/call-api.md b/articles/cognitive-services/language-service/custom-text-classification/how-to/call-api.md
@@ -17,7 +17,7 @@ ms.custom: language-service-clu, ignite-fall-2021, event-tier1-build-2022
 # Query deployment to classify text
 
 After the deployment is added successfully, you can query the deployment to classify text based on the model you assigned to the deployment.
-You can query the deployment programmatically [Prediction API](https://aka.ms/ct-runtime-swagger) or through the [client libraries (Azure SDK)](#get-task-results). 
+You can query the deployment programmatically [Prediction API](https://aka.ms/ct-runtime-api) or through the [client libraries (Azure SDK)](#get-task-results). 
 
 ## Test deployed model
 
diff --git a/articles/cognitive-services/language-service/custom-text-classification/how-to/create-project.md b/articles/cognitive-services/language-service/custom-text-classification/how-to/create-project.md
@@ -27,7 +27,7 @@ Before you start using custom text classification, you will need:
 
 Before you start using custom text classification, you will need an Azure Language resource. It is recommended to create your Language resource and connect a storage account to it in the Azure portal. Creating a resource in the Azure portal lets you create an Azure storage account at the same time, with all of the required permissions pre-configured. You can also read further in the article to learn how to use a pre-existing resource, and configure it to work with custom text classification.
 
-You also will need an Azure storage account where you will upload your `.txt` files that will be used to train a model to classify text.
+You also will need an Azure storage account where you will upload your `.txt` documents that will be used to train a model to classify text.
 
 > [!NOTE]
 >  * You need to have an **owner** role assigned on the resource group to create a Language resource.
diff --git a/articles/cognitive-services/language-service/custom-text-classification/how-to/design-schema.md b/articles/cognitive-services/language-service/custom-text-classification/how-to/design-schema.md
@@ -25,11 +25,11 @@ The schema defines the classes that you need your model to classify your text in
 
     For example, if you are classifying support tickets, you might need the following classes: *login issue*, *hardware issue*, *connectivity issue*, and *new equipment request*.
 
-* **Avoid ambiguity in classes**: Ambiguity arises when the classes you specify share similar meaning to one another. The more ambiguous your schema is, the more tagged data you may need to differentiate between different classes.
+* **Avoid ambiguity in classes**: Ambiguity arises when the classes you specify share similar meaning to one another. The more ambiguous your schema is, the more labeled data you may need to differentiate between different classes.
 
-    For example, if you are classifying food recipes, they may be similar to an extent. To differentiate between *dessert recipe* and *main dish recipe*, you may need to tag more examples to help your model distinguish between the two classes. Avoiding ambiguity saves time and yields better results. 
+    For example, if you are classifying food recipes, they may be similar to an extent. To differentiate between *dessert recipe* and *main dish recipe*, you may need to label more examples to help your model distinguish between the two classes. Avoiding ambiguity saves time and yields better results. 
 
-* **Out of scope data**: When using your model in production, consider adding an *out of scope* class to your schema if you expect documents that don't belong to any of your classes. Then add a few documents to your dataset to be tagged as *out of scope*. The model can learn to recognize irrelevant documents, and predict their tags accordingly.
+* **Out of scope data**: When using your model in production, consider adding an *out of scope* class to your schema if you expect documents that don't belong to any of your classes. Then add a few documents to your dataset to be labeled as *out of scope*. The model can learn to recognize irrelevant documents, and predict their labels accordingly.
 
 
 ## Data selection
@@ -58,8 +58,11 @@ As a prerequisite for creating a custom text classification project, your traini
 
 You can only use `.txt`. documents for custom text. If your data is in other format, you can use [CLUtils parse command](https://github.com/microsoft/CognitiveServicesLanguageUtilities/blob/main/CustomTextAnalytics.CLUtils/Solution/CogSLanguageUtilities.ViewLayer.CliCommands/Commands/ParseCommand/README.md) to change your file format.
 
- You can upload an annotated dataset, or you can upload an unannotated one and [tag your data](../how-to/tag-data.md) in Language studio. 
+ You can upload an annotated dataset, or you can upload an unannotated one and [label your data](../how-to/tag-data.md) in Language studio. 
 
+## Test set
+
+When defining the testing set, make sure to include example documents that are not present in the training set. Defining the testing set is an important step to calculate the [model performance](view-model-evaluation.md#model-details). Also, make sure that the testing set include documents that represent all classes used in your project.
 
 ## Next steps
 
diff --git a/articles/cognitive-services/language-service/custom-text-classification/how-to/improve-model.md b/articles/cognitive-services/language-service/custom-text-classification/how-to/improve-model.md
@@ -15,7 +15,7 @@ ms.custom: language-service-custom-classification, ignite-fall-2021, event-tier1
 
 # Improve custom text classification model performance
 
-In some cases, the model is expected to make predictions that are inconsistent with your tagged classes. Use this article to learn how to observe these inconsistencies and decide on the needed changes needed to improve your model performance.
+In some cases, the model is expected to make predictions that are inconsistent with your labeled classes. Use this article to learn how to observe these inconsistencies and decide on the needed changes needed to improve your model performance.
 
 
 ## Prerequisites
@@ -24,15 +24,15 @@ To optionally improve a model, you'll need to have:
 
 * [A custom text classification project](create-project.md) with a configured Azure blob storage account, 
 * Text data that has [been uploaded](design-schema.md#data-preparation) to your storage account.
-* [Tagged data](tag-data.md) to successfully [train a model](train-model.md).
+* [Labeled data](tag-data.md) to successfully [train a model](train-model.md).
 * Reviewed the [model evaluation details](view-model-evaluation.md) to determine how your model is performing.
 * Familiarized yourself with the [evaluation metrics](../concepts/evaluation-metrics.md).
 
 See the [project development lifecycle](../overview.md#project-development-lifecycle) for more information.
 
 ## Review test set predictions
 
-After you have viewed your [model's evaluation](view-model-evaluation.md), you'll have formed an idea on your model performance. In this page, you can view how your model performs vs how it's expected to perform. You can view predicted and tagged classes side by side for each document in your test set. You can review documents that were predicted differently than they were originally tagged.
+After you have viewed your [model's evaluation](view-model-evaluation.md), you'll have formed an idea on your model performance. In this page, you can view how your model performs vs how it's expected to perform. You can view predicted and labeled classes side by side for each document in your test set. You can review documents that were predicted differently than they were originally labeled.
 
 
 To review inconsistent predictions in the [test set](train-model.md#data-splitting) from within the [Language Studio](https://aka.ms/LanguageStudio):
@@ -45,7 +45,11 @@ To review inconsistent predictions in the [test set](train-model.md#data-splitti
 
 Use the following information to help guide model improvements. 
 
-* If a file that should belong to class  `X` is constantly classified as class `Y`, it means that there is ambiguity between these classes and you need to reconsider your schema. Learn more about [data selection and schema design](design-schema.md#schema-design). Another solution is to consider adding more data to these classes, to help the model improve and differentiate between them.
+* If a file that should belong to class  `X` is constantly classified as class `Y`, it means that there is ambiguity between these classes and you need to reconsider your schema. Learn more about [data selection and schema design](design-schema.md#schema-design). 
+
+* Another solution is to consider adding more data to these classes, to help the model improve and differentiate between them.
+
+* Consider adding more data, to help the model differentiate between different classes.
 
     :::image type="content" source="../media/review-validation-set.png" alt-text="A screenshot showing model predictions in Language Studio." lightbox="../media/review-validation-set.png":::
 
diff --git a/articles/cognitive-services/language-service/custom-text-classification/how-to/tag-data.md b/articles/cognitive-services/language-service/custom-text-classification/how-to/tag-data.md
@@ -58,7 +58,7 @@ Use the following steps to label your data:
 
 4. In the right side pane, **Add class** to your project so you can start labeling your data with them.
 
-    :::image type="content" source="../media/tag-1.png" alt-text="A screenshot showing the data tagging screen" lightbox="../media/tag-1.png":::
+    :::image type="content" source="../media/tag-1.png" alt-text="A screenshot showing the data labeling screen" lightbox="../media/tag-1.png":::
 
 5. Start labeling your files.
 
@@ -83,7 +83,9 @@ Use the following steps to label your data:
     > [!TIP]
     > If you are planning on using **Automatic** data spliting use the default option of assigning all the documents into your training set.
 
-8. Under the **Distribution** pivot you can view the distribution of your labeled documents across training and testing sets. You can learn more about the training testing sets and how they are used [here](train-model.md#data-splitting).
+8. Under the **Distribution** pivot you can view the distribution across training and testing sets. You have two options for viewing:
+   * *Total instances* where you can view count of all labeled instances of a specific class.
+   * *documents with at least one label* where each document is counted if it contains at least one labeled instance of this class.
 
 9. While you're labeling, your changes will be synced periodically, if they have not been saved yet you will find a warning at the top of your page. If you want to save manually, click on **Save labels** button at the bottom of the page.
 
diff --git a/articles/cognitive-services/language-service/custom-text-classification/how-to/train-model.md b/articles/cognitive-services/language-service/custom-text-classification/how-to/train-model.md
diff --git a/articles/cognitive-services/language-service/custom-text-classification/how-to/view-model-evaluation.md b/articles/cognitive-services/language-service/custom-text-classification/how-to/view-model-evaluation.md