You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Find answers to commonly asked questions about concepts, and scenarios related to custom text classification in Azure Cognitive Service for Language.
20
20
21
-
## How many tagged files are needed?
21
+
## How do I get started with the service?
22
22
23
-
Generally, diverse and representative [tagged data](how-to/tag-data.md)leads to better results, given that the tagging is done precisely, consistently and completely. There is no set number of tagged classes that will make every model perform well. Performance highly dependent on your schema, and the ambiguity of your schema. Ambiguous classes need more tags. Performance also depends on the quality of your tagging. The recommended number of tagged instances per entity is 50.
23
+
See the [quickstart](./quickstart.md) to quickly create your first project, or view [how to create projects](how-to/create-project.md) for more details.
24
24
25
25
## What are the service limits?
26
26
27
27
See the [service limits article](service-limits.md) for more information.
28
28
29
-
## What to do if my model scores poorly?
29
+
## Which languages are supported in this feature?
30
+
31
+
See the [language support](./language-support.md) article.
32
+
33
+
## How many tagged files are needed?
34
+
35
+
Generally, diverse and representative [tagged data](how-to/tag-data.md) leads to better results, given that the tagging is done precisely, consistently and completely. There is no set number of tagged classes that will make every model perform well. Performance is highly dependent on your schema and the ambiguity of your schema. Ambiguous classes need more tags. Performance also depends on the quality of your tagging. The recommended number of tagged instances per class is 50.
36
+
37
+
## Training is taking a long time, is this expected?
38
+
39
+
The training process can take some time. As a rough estimate, the expected training time for files with a combined length of 12,800,000 chars is 6 hours.
40
+
41
+
## How do I build my custom model programmatically?
42
+
43
+
You can use the [REST APIs](https://aka.ms/ct-authoring-swagger) to build your custom models. Follow this [quickstart](quickstart.md?pivots=rest-api) to get started with creating a project and creating a model through APIs for examples of how to call the Authoring API.
44
+
45
+
46
+
## What is the recommended CI/CD process?
47
+
48
+
You can train multiple models on the same dataset within the same project. After you have trained your model successfully, you can [view its evaluation](how-to/view-model-evaluation.md). You can [deploy and test](quickstart.md#deploy-your-model) your model within [Language studio](https://aka.ms/languageStudio). You can add or remove tags from your data and train a **new** model and test it as well. View [service limits](service-limits.md)to learn about maximum number of trained models with the same project. When you train a new model, your dataset is [split](how-to/train-model.md#data-splits) randomly into training and testing sets. Because of this, there is no guarantee that the model evaluation is performed on the same test set, so results are not comparable. It is recommended that you develop your own test set and use it to evaluate both models so you can measure improvement.
30
49
31
-
Model evaluation may not always be comprehensive, especially if a specific class is missing or under-represented in your test set. Consider adding more tagged data to your model to both improve performance, and have a more representative test set.
50
+
## Does a low or high model score guarantee bad or good performance in production?
51
+
52
+
Model evaluation may not always be comprehensive. This is dependent on:
53
+
* If the **test set** is too small, the good/bad scores are not representative of model's actual performance. Also if a specific class is missing or under-represented in your test set it will affect model performance.
54
+
***Data diversity** if your data only covers few scenarios/examples of the text you expect in production, your model will not be exposed to all possible scenarios and might perform poorly on the scenarios it hasn't been trained on.
55
+
***Data representation** if the dataset used to train the model is not representative of the data that would be introduced to the model in production, model performance will be affected greatly.
56
+
57
+
See the [data selection and schema design](how-to/design-schema.md) article for more information.
32
58
33
59
## How do I improve model performance?
34
60
35
-
View the [confusion matrix](how-to/view-model-evaluation.md) to identify schema ambiguity. Then [review your test set](how-to/improve-model.md) to see predicted and tagged classes side-by-side so you can get a better idea of your model performance, and decide if any changes in the schema or the tags are necessary.
61
+
* View the model [confusion matrix](how-to/view-model-evaluation.md), if you notice that a certain class is frequently classified incorrectly, consider adding more tagged instances for this class. If you notice that two classes are frequently classified as each other, this means the schema is ambiguous, consider merging them both into one class for better performance.
62
+
63
+
*[Examine Data distribution](how-to/improve-model.md#examine-data-distribution-from-language-studio) If one of the classes has a lot more tagged instances than the others, your model may be biased towards this class. Add more data to the other classes or remove most of the examples from the dominating class.
64
+
65
+
* Learn more about data selection and schema design [here](how-to/design-schema.md).
66
+
67
+
*[Review your test set](how-to/improve-model.md) to see predicted and tagged classes side-by-side so you can get a better idea of your model performance, and decide if any changes in the schema or the tags are necessary.
68
+
69
+
## When I retrain my model I get different results, why is this?
70
+
71
+
* When you train a new model your dataset is [split](how-to/train-model.md#data-splits) randomly into training and testing sets, so there is no guarantee that the reflected model evaluation is on the same test set, so results are not comparable.
72
+
73
+
* If you are retraining the same model, your test set will be the same, but you might notice a slight change in predictions made by the model. This is because the trained model is not robust enough, which is a factor of how representative and distinct your data is, and the quality of your tagged data.
74
+
75
+
## How do I get predictions in different languages?
76
+
77
+
First, you need to enable the multilingual option when [creating your project](how-to/create-project.md) or you can enable it later from the project settings page. After you train and deploy your model, you can start querying it in [multiple languages](language-support.md#multiple-language-support). You may get varied results for different languages. To improve the accuracy of any language, add more tagged instances to your project in that language to introduce the trained model to more syntax of that language.
36
78
37
79
## I trained my model, but I can't test it
38
80
39
81
You need to [deploy your model](quickstart.md#deploy-your-model) before you can test it.
40
82
41
-
## How do I use the analyze API?
83
+
## How do I use my trained model to make predictions?
42
84
43
-
After deploying your model, you [call the runtime API](how-to/call-api.md). See the [Analyze API reference](https://aka.ms/ct-runtime-swagger) for more information.
85
+
After deploying your model, you [call the prediction API](how-to/call-api.md). See the [Prediction API reference](https://aka.ms/ct-runtime-swagger) for more information.
44
86
45
87
## Data privacy and security
46
88
47
-
Your data is only stored in your Azure storage account, Custom classification only has access to read from it during training and evaluation.
89
+
Custom text classification is a data processor for General Data Protection Regulation (GDPR) purposes. In compliance with GDPR policies, Custom classification users have full control to view, export, or delete any user content either through the [Language Studio](https://aka.ms/languageStudio) or programmatically by using [REST APIs](https://aka.ms/ct-authoring-swagger).
90
+
91
+
Your data is only stored in your Azure Storage account. Custom classification only has access to read from it during training.
48
92
49
-
<!--## How to clone my project?
93
+
## How to clone my project?
50
94
51
-
To clone your project you need to [export]() project assests and then [import]() them into a new project. -->
95
+
To clone your project you need to use the export API to export the project assets and then import them into a new project. See [REST APIs](https://aka.ms/ct-authoring-swagger) reference for both operations.
52
96
53
97
## Next steps
54
98
55
99
*[Custom text classification overview](overview.md)
Copy file name to clipboardExpand all lines: articles/cognitive-services/language-service/custom-classification/how-to/view-model-evaluation.md
+6-3Lines changed: 6 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,13 +55,16 @@ The evaluation process uses the trained model to predict user-defined classes fo
55
55
56
56
Under the **Test set confusion matrix**, you can find the confusion matrix for the model.
57
57
58
-
**Single Label Classification**
58
+
> [!NOTE]
59
+
> The confusion matrix is currently not supported for multiple label classification projects.
60
+
61
+
**Single label classification**
59
62
60
63
:::image type="content" source="../media/conf-matrix-single.png" alt-text="Confusion matrix for single class classification" lightbox="../media/conf-matrix-single.png":::
61
64
62
-
**Multiple Label Classification**
65
+
<!--**Multiple Label Classification**
63
66
64
-
:::image type="content" source="../media/conf-matrix-multi.png" alt-text="Confusion matrix for multiple class classification" lightbox="../media/conf-matrix-multi.png":::
67
+
:::image type="content" source="../media/conf-matrix-multi.png" alt-text="Confusion matrix for multiple class classification" lightbox="../media/conf-matrix-multi.png":::-->
0 commit comments