You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/language-service/conversational-language-understanding/concepts/evaluation-metrics.md
+10Lines changed: 10 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -171,6 +171,16 @@ Similarly,
171
171
* The *false Negative* of the model is the sum of *false negatives* for all intents or entities.
172
172
173
173
174
+
## Guidance
175
+
176
+
After you trained your model, you will see some guidance and recommendations on how to improve the model. It's recommended to have a model covering every point in the guidance section.
177
+
178
+
* Training set has enough data: When an intent or entity has fewer than 15 labeled instances in the training data, it can lead to lower accuracy due to the model not being adequately trained on that intent. In this case, consider adding more labeled data in the training set. You should only consider adding more labeled data to your entity if your entity has a learned component. If your entity is defined only by list, prebuilt, and regex components, then this recommendation is not applicable.
179
+
180
+
* All intents or entities are present in test set: When the testing data lacks labeled instances for an intent or entity, the model evaluation is less comprehensive due to untested scenarios. Consider having test data for every intent and entity in your model to ensure everything is being tested.
181
+
182
+
* Unclear distinction between intents or entities: When data is similar for different intents or entities, it can lead to lower accuracy because they may be frequently misclassified as each other. Review the following intents and entities and consider merging them if they’re similar. Otherwise, add more examples to better distinguish them from each other. You can check the *confusion matrix* tab for more guidance. If you are seeing two entities constantly being predicted for the same spans because they share the same list, prebuilt, or regex components, then make sure to add a **learned** component for each entity and make it **required**. Learn more about [entity components](./entity-components.md).
183
+
174
184
## Next steps
175
185
176
186
[Train a model in Language Studio](../how-to/train-model.md)
Copy file name to clipboardExpand all lines: articles/cognitive-services/language-service/conversational-language-understanding/how-to/view-model-evaluation.md
-5Lines changed: 0 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,11 +33,6 @@ See the [project development lifecycle](../overview.md#project-development-lifec
33
33
34
34
### [Language studio](#tab/Language-studio)
35
35
36
-
> [!Note]
37
-
> The results here are for the machine learning entity component only.
38
-
39
-
In the **view model details** page, you'll be able to see all your models, with their current training status, and the date they were last trained.
Copy file name to clipboardExpand all lines: articles/cognitive-services/language-service/conversational-language-understanding/includes/language-studio/model-performance.md
+50-9Lines changed: 50 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,14 +13,55 @@ ms.author: aahi
13
13
14
14
2. Select **Model performance** from the menu on the left side of the screen.
15
15
16
-
3. In this page you can only view the successfully trained models, F1 score of each model and [model expiration date](../../../concepts/model-lifecycle.md#expiration-timeline). You can click on the model name for more details about its performance.
16
+
3. In this page you can only view the successfully trained models, F1 score of each model and [model expiration date](../../../concepts/model-lifecycle.md#expiration-timeline). You can click on the model name for more details about its performance. Models only include evaluation details if there was test data selected while training the model.
17
17
18
-
4. You can find the *model-level evaluation metrics* under **Overview**, and the *intent-level* and *entity-level* evaluation metrics. See [Evaluation metrics](../../concepts/evaluation-metrics.md#) for more information.
18
+
### [Overview](#tab/overview)
19
19
20
-
:::image type="content" source="../../media/model-details.png" alt-text="A screenshot of the model performance metrics in Language Studio" lightbox="../../media/model-details.png":::
21
-
22
-
5. The [confusion matrix](../../concepts/evaluation-metrics.md#confusion-matrix) for the model is located under **Test set confusion matrix**. You can see the confusion matrix for intents and entities.
23
-
24
-
> [!NOTE]
25
-
> If you don't see any of the intents or entities you have in your model displayed here, it is because they weren't in any of the utterances that were used for the test set.
26
-
20
+
* In this tab you can view the model's details such as: F1 score, precision, recall, date and time for the training job, total training time and number of training and testing utterances included in this training job. You can view details between intents or entities by selecting Model Type at the top.
21
+
22
+
:::image type="content" source="../../media/overview.png" alt-text="A screenshot that shows the overview page for model evaluation." lightbox="../../media/overview.png":::
23
+
24
+
* You will also see [guidance](../../concepts/evaluation-metrics.md#guidance) on how to improve the model. When clicking on *view details* a side panel will open to give more guidance on how to improve the model.
25
+
26
+
:::image type="content" source="../../media/overview-guidance.png" alt-text="A screenshot that shows the guidance page for model evaluation." lightbox="../../media/overview-guidance.png":::
27
+
28
+
### [Model type performance](#tab/model-performance)
29
+
30
+
* This is a snapshot of how your model performed during testing. The metrics here are static and tied to your model, so they won’t update until you train again.
31
+
32
+
* You can see for each intent or entity the precision, recall, F1 score, number of training and testing labels. Entities that do not include a learned component will show no training labels. A learned component is added only by adding labels in your training set.
33
+
34
+
35
+
:::image type="content" source="../../media/model-type-performance.png" alt-text="A screenshot of model performance." lightbox="../../media/model-type-performance.png":::
36
+
37
+
### [Test set details](#tab/test-set)
38
+
39
+
* Here you will see the utterances included in the **test set** and their intent or entity predictions. You can use the *Show errors only* toggle to show only the utterances where there are different predictions from their labels, or unselect the toggle to view all utterances in the test set. You can also toggle the view between **Showing entity labels** as the view for each utterance, or **Showing entity predictions**. Entity predictions show as dotted lines and labels show as solid lines.
40
+
41
+
* You can expand each row to view its intent or entity predictions, specified by the **Model Type** column. The **Text** column shows the text of the entity that was predicted or labeled. Each row has a **Labeled as** column to indicate the label in the test set, and **Predicted as** column to indicate the actual prediction. Also, you will see whether it is a [true positive](../../concepts/evaluation-metrics.md), [false positive](../../concepts/evaluation-metrics.md) or [false negative](../../concepts/evaluation-metrics.md) in the **Result Type** column.
42
+
43
+
:::image type="content" source="../../media/test-set.png" alt-text="A screenshot of test set details." lightbox="../../media/test-set.png":::
This snapshot shows how intents or entities are distributed across your training and testing sets. This data is static and tied to your model, so it won’t update until you train again. Entities that do not include a learned component will show no training labels. A learned component is added only by adding labels in your training set.
48
+
49
+
:::image type="content" source="../../media/dataset-table.png" alt-text="A screenshot showing distribution in table view." lightbox="../../media/dataset-table.png":::
50
+
51
+
### [Confusion matrix](#tab/confusion-matrix)
52
+
53
+
A [confusion matrix](../../concepts/evaluation-metrics.md#confusion-matrix) is an N x N matrix used for evaluating the performance of your model, where N is the number of target intents or entities. The matrix compares the expected labels with those predicted by the model to identify which intents or entities are being misclassified as other intents and entities. You can click into any cell of the confusion matrix to identify exactly which utterances contributed to the values in that cell.
54
+
55
+
You can view the intent confusion matrix in *raw count* or *normalized* view. Raw count is the actual number of utterances that have been predicted and labeled for a set of intents. Normalized value is the ratio, between 0 and 1, of the predicted and labeled utterances for a set of intents.
56
+
57
+
You can view the entity confusion matrix in *character overlap count* or *normalized character overlap* view. Character overlap count is the actual number of spans that have been predicted and labeled for a set of entities. Normalized character overlap is the ratio, between 0 and 1, of the predicted and labeled spans for a set of entities. Sometimes entities can be predicted or labeled partially, leading to decimal values in the confusion matrix.
58
+
59
+
:::image type="content" source="../../media/confusion.png" alt-text="A screenshot of a confusion matrix in Language Studio." lightbox="../../media/confusion.png":::
60
+
61
+
* All values: Will show the confusion matrix for all intents or entities.
62
+
63
+
* Only errors: Will show the confusion matrix for intents or entities with errors only.
64
+
65
+
* Only matches: Will show the confusion matrix for intents or entities with correct predictions only.
0 commit comments