Skip to content

Commit 59a2abf

Browse files
authored
Merge pull request #227906 from hazemelh/model-evaluation-clu
Add new CLU model evaluation guidance
2 parents 3af1d41 + 71ae8ce commit 59a2abf

File tree

10 files changed

+371
-14
lines changed

10 files changed

+371
-14
lines changed

articles/cognitive-services/language-service/conversational-language-understanding/concepts/evaluation-metrics.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,6 +171,16 @@ Similarly,
171171
* The *false Negative* of the model is the sum of *false negatives* for all intents or entities.
172172

173173

174+
## Guidance
175+
176+
After you trained your model, you will see some guidance and recommendations on how to improve the model. It's recommended to have a model covering every point in the guidance section.
177+
178+
* Training set has enough data: When an intent or entity has fewer than 15 labeled instances in the training data, it can lead to lower accuracy due to the model not being adequately trained on that intent. In this case, consider adding more labeled data in the training set. You should only consider adding more labeled data to your entity if your entity has a learned component. If your entity is defined only by list, prebuilt, and regex components, then this recommendation is not applicable.
179+
180+
* All intents or entities are present in test set: When the testing data lacks labeled instances for an intent or entity, the model evaluation is less comprehensive due to untested scenarios. Consider having test data for every intent and entity in your model to ensure everything is being tested.
181+
182+
* Unclear distinction between intents or entities: When data is similar for different intents or entities, it can lead to lower accuracy because they may be frequently misclassified as each other. Review the following intents and entities and consider merging them if they’re similar. Otherwise, add more examples to better distinguish them from each other. You can check the *confusion matrix* tab for more guidance. If you are seeing two entities constantly being predicted for the same spans because they share the same list, prebuilt, or regex components, then make sure to add a **learned** component for each entity and make it **required**. Learn more about [entity components](./entity-components.md).
183+
174184
## Next steps
175185

176186
[Train a model in Language Studio](../how-to/train-model.md)

articles/cognitive-services/language-service/conversational-language-understanding/how-to/view-model-evaluation.md

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,6 @@ See the [project development lifecycle](../overview.md#project-development-lifec
3333

3434
### [Language studio](#tab/Language-studio)
3535

36-
> [!Note]
37-
> The results here are for the machine learning entity component only.
38-
39-
In the **view model details** page, you'll be able to see all your models, with their current training status, and the date they were last trained.
40-
4136
[!INCLUDE [Model performance](../includes/language-studio/model-performance.md)]
4237

4338
### [REST APIs](#tab/REST-APIs)

articles/cognitive-services/language-service/conversational-language-understanding/includes/language-studio/model-performance.md

Lines changed: 50 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,14 +13,55 @@ ms.author: aahi
1313

1414
2. Select **Model performance** from the menu on the left side of the screen.
1515

16-
3. In this page you can only view the successfully trained models, F1 score of each model and [model expiration date](../../../concepts/model-lifecycle.md#expiration-timeline). You can click on the model name for more details about its performance.
16+
3. In this page you can only view the successfully trained models, F1 score of each model and [model expiration date](../../../concepts/model-lifecycle.md#expiration-timeline). You can click on the model name for more details about its performance. Models only include evaluation details if there was test data selected while training the model.
1717

18-
4. You can find the *model-level evaluation metrics* under **Overview**, and the *intent-level* and *entity-level* evaluation metrics. See [Evaluation metrics](../../concepts/evaluation-metrics.md#) for more information.
18+
### [Overview](#tab/overview)
1919

20-
:::image type="content" source="../../media/model-details.png" alt-text="A screenshot of the model performance metrics in Language Studio" lightbox="../../media/model-details.png":::
21-
22-
5. The [confusion matrix](../../concepts/evaluation-metrics.md#confusion-matrix) for the model is located under **Test set confusion matrix**. You can see the confusion matrix for intents and entities.
23-
24-
> [!NOTE]
25-
> If you don't see any of the intents or entities you have in your model displayed here, it is because they weren't in any of the utterances that were used for the test set.
26-
20+
* In this tab you can view the model's details such as: F1 score, precision, recall, date and time for the training job, total training time and number of training and testing utterances included in this training job. You can view details between intents or entities by selecting Model Type at the top.
21+
22+
:::image type="content" source="../../media/overview.png" alt-text="A screenshot that shows the overview page for model evaluation." lightbox="../../media/overview.png":::
23+
24+
* You will also see [guidance](../../concepts/evaluation-metrics.md#guidance) on how to improve the model. When clicking on *view details* a side panel will open to give more guidance on how to improve the model.
25+
26+
:::image type="content" source="../../media/overview-guidance.png" alt-text="A screenshot that shows the guidance page for model evaluation." lightbox="../../media/overview-guidance.png":::
27+
28+
### [Model type performance](#tab/model-performance)
29+
30+
* This is a snapshot of how your model performed during testing. The metrics here are static and tied to your model, so they won’t update until you train again.
31+
32+
* You can see for each intent or entity the precision, recall, F1 score, number of training and testing labels. Entities that do not include a learned component will show no training labels. A learned component is added only by adding labels in your training set.
33+
34+
35+
:::image type="content" source="../../media/model-type-performance.png" alt-text="A screenshot of model performance." lightbox="../../media/model-type-performance.png":::
36+
37+
### [Test set details](#tab/test-set)
38+
39+
* Here you will see the utterances included in the **test set** and their intent or entity predictions. You can use the *Show errors only* toggle to show only the utterances where there are different predictions from their labels, or unselect the toggle to view all utterances in the test set. You can also toggle the view between **Showing entity labels** as the view for each utterance, or **Showing entity predictions**. Entity predictions show as dotted lines and labels show as solid lines.
40+
41+
* You can expand each row to view its intent or entity predictions, specified by the **Model Type** column. The **Text** column shows the text of the entity that was predicted or labeled. Each row has a **Labeled as** column to indicate the label in the test set, and **Predicted as** column to indicate the actual prediction. Also, you will see whether it is a [true positive](../../concepts/evaluation-metrics.md), [false positive](../../concepts/evaluation-metrics.md) or [false negative](../../concepts/evaluation-metrics.md) in the **Result Type** column.
42+
43+
:::image type="content" source="../../media/test-set.png" alt-text="A screenshot of test set details." lightbox="../../media/test-set.png":::
44+
45+
### [Dataset distribution](#tab/dataset-distribution)
46+
47+
This snapshot shows how intents or entities are distributed across your training and testing sets. This data is static and tied to your model, so it won’t update until you train again. Entities that do not include a learned component will show no training labels. A learned component is added only by adding labels in your training set.
48+
49+
:::image type="content" source="../../media/dataset-table.png" alt-text="A screenshot showing distribution in table view." lightbox="../../media/dataset-table.png":::
50+
51+
### [Confusion matrix](#tab/confusion-matrix)
52+
53+
A [confusion matrix](../../concepts/evaluation-metrics.md#confusion-matrix) is an N x N matrix used for evaluating the performance of your model, where N is the number of target intents or entities. The matrix compares the expected labels with those predicted by the model to identify which intents or entities are being misclassified as other intents and entities. You can click into any cell of the confusion matrix to identify exactly which utterances contributed to the values in that cell.
54+
55+
You can view the intent confusion matrix in *raw count* or *normalized* view. Raw count is the actual number of utterances that have been predicted and labeled for a set of intents. Normalized value is the ratio, between 0 and 1, of the predicted and labeled utterances for a set of intents.
56+
57+
You can view the entity confusion matrix in *character overlap count* or *normalized character overlap* view. Character overlap count is the actual number of spans that have been predicted and labeled for a set of entities. Normalized character overlap is the ratio, between 0 and 1, of the predicted and labeled spans for a set of entities. Sometimes entities can be predicted or labeled partially, leading to decimal values in the confusion matrix.
58+
59+
:::image type="content" source="../../media/confusion.png" alt-text="A screenshot of a confusion matrix in Language Studio." lightbox="../../media/confusion.png":::
60+
61+
* All values: Will show the confusion matrix for all intents or entities.
62+
63+
* Only errors: Will show the confusion matrix for intents or entities with errors only.
64+
65+
* Only matches: Will show the confusion matrix for intents or entities with correct predictions only.
66+
67+
---

0 commit comments

Comments
 (0)