Skip to content

Commit bc2f1d8

Browse files
Merge pull request #91946 from melghazali/master
[Cog svcs] modified two CT documents for knowledge update
2 parents a8a7ea8 + fc71bff commit bc2f1d8

File tree

3 files changed

+19
-21
lines changed

3 files changed

+19
-21
lines changed

articles/cognitive-services/Translator/custom-translator/how-to-migrate.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ These actions are performed during migration:
2222
* Any migrated trainings that were not in the deployed state will be put into the migrated draft state. In this state, you will have the option of training a model with the migrated definition, but regular training charges will apply.
2323
* At any point, the BLEU score migrated from the Hub training can be found in the TrainingDetails page of the model in the “Bleu score in MT Hub” heading.
2424

25-
> [!Note]
26-
> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](sentence-alignment.md#suggested-minimum-number-of-extracted-and-aligned-sentences).
25+
> [!Note]
26+
> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](sentence-alignment.md#suggested-minimum-number-of-sentences).
2727
2828
## Find Custom Translator Workspace ID
2929

@@ -118,7 +118,7 @@ If you want more detailed migration report about your projects, trainings and do
118118
* Systems with language pairs NOT yet available in Custom Translator will only be available to access data or undeploy through Custom Translator. These projects will be marked as “Unavailable” on the Projects page. As we enable new language pairs with Custom Translator, the projects will become active to train and deploy.
119119
* Migrating a project from Hub to Custom Translator will not have any impact on your Hub trainings or projects. We do not delete projects or documents from Hub during a migration and we do not undeploy models.
120120
* You are only permitted to migrate once per project. If you need to repeat a migration on a project, please contact us.
121-
* Custom Translator supports NMT language pairs to and from English. [View the complete list of supported langauges](https://docs.microsoft.com/azure/cognitive-services/translator/language-support#customization). Hub does not require baseline models and therefore supports several thousand languages. You can migrate an unsupported language pair, however we will only perform the migration of documents and project definitions. We will not be able to train the new model. Furthermore, these documents and projects will be displayed as inactive in order to indicate that they can't be used at this time. If support is added for these projects and/or documents, they will become active and trainable.
121+
* Custom Translator supports NMT language pairs to and from English. [View the complete list of supported languages](https://docs.microsoft.com/azure/cognitive-services/translator/language-support#customization). Hub does not require baseline models and therefore supports several thousand languages. You can migrate an unsupported language pair, however we will only perform the migration of documents and project definitions. We will not be able to train the new model. Furthermore, these documents and projects will be displayed as inactive in order to indicate that they can't be used at this time. If support is added for these projects and/or documents, they will become active and trainable.
122122
* Custom Translator does not currently support monolingual training data. Like unsupported language pairs, you can migrate monolingual documents, but they show as inactive until monolingual data is supported.
123123
* Custom Translator requires 10k parallel sentences in order to train. Microsoft Hub could train on a smaller set of data. If a training is migrated which does not meet this requirement, it will not be trained.
124124

articles/cognitive-services/Translator/custom-translator/sentence-alignment.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -39,21 +39,21 @@ For best results, try to make sure that you have one sentence per line in your
3939
files. Don't have newline characters within a sentence as this will cause poor
4040
alignments.
4141

42-
## Suggested minimum number of extracted and aligned sentences
43-
44-
For a training to succeed, the table below shows the minimum number of extracted
45-
sentences and aligned sentences required in each data set. The
46-
suggested minimum number of extracted sentences is much higher than the
47-
suggested minimum number of aligned sentences to take into account the fact that
48-
the sentence alignment may not be able to align all extracted sentences
49-
successfully.
50-
51-
| Data set | Suggested minimum extracted sentence count | Suggested minimum aligned sentence count | Maximum aligned sentence count |
52-
|------------|--------------------------------------------|------------------------------------------|--------------------------------|
53-
| Training | 10,000 | 2,000 | No upper limit |
54-
| Tuning | 2,000 | 500 | 2,500 |
55-
| Testing | 2,000 | 500 | 2,500 |
56-
| Dictionary | 0 | 0 | No upper limit |
42+
## Suggested minimum number of sentences
43+
44+
For a training to succeed, the table below shows the minimum number of sentences required in each document type. This limitation is a safety net to ensure your parallel sentences contain enough unique vocabulary to successfully train a translation model. The general guideline is having more in-domain parallel sentences of human translation quality should produce higher quality models.
45+
46+
| Document type | Suggested minimum sentence count | Maximum sentence count |
47+
|------------|--------------------------------------------|--------------------------------|
48+
| Training | 10,000 | No upper limit |
49+
| Tuning | 5,000 | 2,500 |
50+
| Testing | 5,000 | 2,500 |
51+
| Dictionary | 0 | No upper limit |
52+
53+
> [!NOTE]
54+
> - Training will not start and will fail if the 10,000 minimum sentence count for Training is not met.
55+
> - Tuning and Testing are optional. If you do not provide them, the system will remove an appropriate percentage from Training to use for validation and testing.
56+
> - You can train a model using only dictionary data. Please refer to [What is Dictionary](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/what-is-dictionary).
5757
5858
## Next steps
5959

articles/cognitive-services/Translator/custom-translator/what-are-parallel-documents.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,7 @@ system in either direction.
2323

2424
## Requirements
2525

26-
You will need a minimum of 10,000 unique parallel sentences to train a system. As a
27-
best practice, you can continuously add more parallel content and retrain, to
28-
improve the quality of your translation system.
26+
You will need a minimum of 10,000 unique aligned parallel sentences to train a system. This limitation is a safety net to ensure your parallel sentences contain enough unique vocabulary to successfully train a translation model. As a best practice, continuously add more parallel content and retrain to improve the quality of your translation system. Please refer to [Sentence Alignment](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/sentence-alignment).
2927

3028
Microsoft requires that documents uploaded to the Custom Translator do not
3129
violate a third party’s copyright or intellectual properties. For more

0 commit comments

Comments
 (0)