Skip to content

Commit bcca4b9

Browse files
authored
Merge pull request #178314 from aahill/m-update
Maged updates
2 parents ab8359f + 22b0395 commit bcca4b9

24 files changed

+125
-32
lines changed

articles/cognitive-services/language-service/custom-classification/faq.md

Lines changed: 55 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -18,39 +18,83 @@ ms.custom: language-service-custom-classification, ignite-fall-2021
1818

1919
Find answers to commonly asked questions about concepts, and scenarios related to custom text classification in Azure Cognitive Service for Language.
2020

21-
## How many tagged files are needed?
21+
## How do I get started with the service?
2222

23-
Generally, diverse and representative [tagged data](how-to/tag-data.md) leads to better results, given that the tagging is done precisely, consistently and completely. There is no set number of tagged classes that will make every model perform well. Performance highly dependent on your schema, and the ambiguity of your schema. Ambiguous classes need more tags. Performance also depends on the quality of your tagging. The recommended number of tagged instances per entity is 50.
23+
See the [quickstart](./quickstart.md) to quickly create your first project, or view [how to create projects](how-to/create-project.md) for more details.
2424

2525
## What are the service limits?
2626

2727
See the [service limits article](service-limits.md) for more information.
2828

29-
## What to do if my model scores poorly?
29+
## Which languages are supported in this feature?
30+
31+
See the [language support](./language-support.md) article.
32+
33+
## How many tagged files are needed?
34+
35+
Generally, diverse and representative [tagged data](how-to/tag-data.md) leads to better results, given that the tagging is done precisely, consistently and completely. There is no set number of tagged classes that will make every model perform well. Performance is highly dependent on your schema and the ambiguity of your schema. Ambiguous classes need more tags. Performance also depends on the quality of your tagging. The recommended number of tagged instances per class is 50.
36+
37+
## Training is taking a long time, is this expected?
38+
39+
The training process can take some time. As a rough estimate, the expected training time for files with a combined length of 12,800,000 chars is 6 hours.
40+
41+
## How do I build my custom model programmatically?
42+
43+
You can use the [REST APIs](https://aka.ms/ct-authoring-swagger) to build your custom models. Follow this [quickstart](quickstart.md?pivots=rest-api) to get started with creating a project and creating a model through APIs for examples of how to call the Authoring API.
44+
45+
46+
## What is the recommended CI/CD process?
47+
48+
You can train multiple models on the same dataset within the same project. After you have trained your model successfully, you can [view its evaluation](how-to/view-model-evaluation.md). You can [deploy and test](quickstart.md#deploy-your-model) your model within [Language studio](https://aka.ms/languageStudio). You can add or remove tags from your data and train a **new** model and test it as well. View [service limits](service-limits.md)to learn about maximum number of trained models with the same project. When you train a new model, your dataset is [split](how-to/train-model.md#data-splits) randomly into training and testing sets. Because of this, there is no guarantee that the model evaluation is performed on the same test set, so results are not comparable. It is recommended that you develop your own test set and use it to evaluate both models so you can measure improvement.
3049

31-
Model evaluation may not always be comprehensive, especially if a specific class is missing or under-represented in your test set. Consider adding more tagged data to your model to both improve performance, and have a more representative test set.
50+
## Does a low or high model score guarantee bad or good performance in production?
51+
52+
Model evaluation may not always be comprehensive. This is dependent on:
53+
* If the **test set** is too small, the good/bad scores are not representative of model's actual performance. Also if a specific class is missing or under-represented in your test set it will affect model performance.
54+
* **Data diversity** if your data only covers few scenarios/examples of the text you expect in production, your model will not be exposed to all possible scenarios and might perform poorly on the scenarios it hasn't been trained on.
55+
* **Data representation** if the dataset used to train the model is not representative of the data that would be introduced to the model in production, model performance will be affected greatly.
56+
57+
See the [data selection and schema design](how-to/design-schema.md) article for more information.
3258

3359
## How do I improve model performance?
3460

35-
View the [confusion matrix](how-to/view-model-evaluation.md) to identify schema ambiguity. Then [review your test set](how-to/improve-model.md) to see predicted and tagged classes side-by-side so you can get a better idea of your model performance, and decide if any changes in the schema or the tags are necessary.
61+
* View the model [confusion matrix](how-to/view-model-evaluation.md), if you notice that a certain class is frequently classified incorrectly, consider adding more tagged instances for this class. If you notice that two classes are frequently classified as each other, this means the schema is ambiguous, consider merging them both into one class for better performance.
62+
63+
* [Examine Data distribution](how-to/improve-model.md#examine-data-distribution-from-language-studio) If one of the classes has a lot more tagged instances than the others, your model may be biased towards this class. Add more data to the other classes or remove most of the examples from the dominating class.
64+
65+
* Learn more about data selection and schema design [here](how-to/design-schema.md).
66+
67+
* [Review your test set](how-to/improve-model.md) to see predicted and tagged classes side-by-side so you can get a better idea of your model performance, and decide if any changes in the schema or the tags are necessary.
68+
69+
## When I retrain my model I get different results, why is this?
70+
71+
* When you train a new model your dataset is [split](how-to/train-model.md#data-splits) randomly into training and testing sets, so there is no guarantee that the reflected model evaluation is on the same test set, so results are not comparable.
72+
73+
* If you are retraining the same model, your test set will be the same, but you might notice a slight change in predictions made by the model. This is because the trained model is not robust enough, which is a factor of how representative and distinct your data is, and the quality of your tagged data.
74+
75+
## How do I get predictions in different languages?
76+
77+
First, you need to enable the multilingual option when [creating your project](how-to/create-project.md) or you can enable it later from the project settings page. After you train and deploy your model, you can start querying it in [multiple languages](language-support.md#multiple-language-support). You may get varied results for different languages. To improve the accuracy of any language, add more tagged instances to your project in that language to introduce the trained model to more syntax of that language.
3678

3779
## I trained my model, but I can't test it
3880

3981
You need to [deploy your model](quickstart.md#deploy-your-model) before you can test it.
4082

41-
## How do I use the analyze API?
83+
## How do I use my trained model to make predictions?
4284

43-
After deploying your model, you [call the runtime API](how-to/call-api.md). See the [Analyze API reference](https://aka.ms/ct-runtime-swagger) for more information.
85+
After deploying your model, you [call the prediction API](how-to/call-api.md). See the [Prediction API reference](https://aka.ms/ct-runtime-swagger) for more information.
4486

4587
## Data privacy and security
4688

47-
Your data is only stored in your Azure storage account, Custom classification only has access to read from it during training and evaluation.
89+
Custom text classification is a data processor for General Data Protection Regulation (GDPR) purposes. In compliance with GDPR policies, Custom classification users have full control to view, export, or delete any user content either through the [Language Studio](https://aka.ms/languageStudio) or programmatically by using [REST APIs](https://aka.ms/ct-authoring-swagger).
90+
91+
Your data is only stored in your Azure Storage account. Custom classification only has access to read from it during training.
4892

49-
<!-- ## How to clone my project?
93+
## How to clone my project?
5094

51-
To clone your project you need to [export]() project assests and then [import]() them into a new project. -->
95+
To clone your project you need to use the export API to export the project assets and then import them into a new project. See [REST APIs](https://aka.ms/ct-authoring-swagger) reference for both operations.
5296

5397
## Next steps
5498

5599
* [Custom text classification overview](overview.md)
56-
* [quickstart](quickstart.md)
100+
* [Quickstart](quickstart.md)

articles/cognitive-services/language-service/custom-classification/how-to/view-model-evaluation.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,13 +55,16 @@ The evaluation process uses the trained model to predict user-defined classes fo
5555
5656
Under the **Test set confusion matrix**, you can find the confusion matrix for the model.
5757

58-
**Single Label Classification**
58+
> [!NOTE]
59+
> The confusion matrix is currently not supported for multiple label classification projects.
60+
61+
**Single label classification**
5962

6063
:::image type="content" source="../media/conf-matrix-single.png" alt-text="Confusion matrix for single class classification" lightbox="../media/conf-matrix-single.png":::
6164

62-
**Multiple Label Classification**
65+
<!-- **Multiple Label Classification**
6366
64-
:::image type="content" source="../media/conf-matrix-multi.png" alt-text="Confusion matrix for multiple class classification" lightbox="../media/conf-matrix-multi.png":::
67+
:::image type="content" source="../media/conf-matrix-multi.png" alt-text="Confusion matrix for multiple class classification" lightbox="../media/conf-matrix-multi.png"::: -->
6568

6669
## Next steps
6770

-4.48 KB
Loading
-2.57 KB
Loading
-1.29 KB
Loading
-547 Bytes
Loading
1.52 KB
Loading
-24.5 KB
Loading
28 KB
Loading
0 Bytes
Loading

0 commit comments

Comments
 (0)