Skip to content

Commit d993b88

Browse files
authored
Merge pull request #97695 from melghazali/master
[Cog Svcs] Correcting dynamic dictionary support note
2 parents 0e2bce6 + 322a4e0 commit d993b88

File tree

4 files changed

+32
-32
lines changed

4 files changed

+32
-32
lines changed

articles/cognitive-services/Translator/custom-translator/how-to-migrate.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ These actions are performed during migration:
2323
* At any point, the BLEU score migrated from the Hub training can be found in the TrainingDetails page of the model in the “Bleu score in MT Hub” heading.
2424

2525
> [!Note]
26-
> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](sentence-alignment.md#suggested-minimum-number-of-sentences).
26+
> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/sentence-alignment#suggested-minimum-number-of-sentences).
2727
2828
## Find Custom Translator Workspace ID
2929

articles/cognitive-services/Translator/custom-translator/quickstart-build-deploy-custom-model.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ your project. For more details, visit [Create Project](how-to-create-project.md)
4040

4141
## Upload documents
4242

43-
Next, upload [training](training-and-model.md#training-dataset-for-custom-translator), [tuning](training-and-model.md#tuning-dataset-for-custom-translator) and [testing](training-and-model.md#testing-dataset-for-custom-translator) document sets. You can upload both [parallel](what-are-parallel-documents.md) and combo documents. You can also upload [dictionary](what-is-dictionary.md).
43+
Next, upload [training](training-and-model.md#training-document-type-for-custom-translator), [tuning](training-and-model.md#tuning-document-type-for-custom-translator) and [testing](training-and-model.md#testing-dataset-for-custom-translator) document sets. You can upload both [parallel](what-are-parallel-documents.md) and combo documents. You can also upload [dictionary](what-is-dictionary.md).
4444

4545
You can upload documents from either the documents tab or from a specific
4646
project's page.
@@ -58,8 +58,8 @@ model.
5858

5959
Select the project you've created. You'll see all the documents you've uploaded
6060
that share a language pair with this project. Select the documents that you want
61-
included in your model. You can select [training](training-and-model.md#training-dataset-for-custom-translator),
62-
[tuning](training-and-model.md#tuning-dataset-for-custom-translator), and [testing](training-and-model.md#testing-dataset-for-custom-translator) data or select just
61+
included in your model. You can select [training](training-and-model.md#training-document-type-for-custom-translator),
62+
[tuning](training-and-model.md#tuning-document-type-for-custom-translator), and [testing](training-and-model.md#testing-dataset-for-custom-translator) data or select just
6363
training data and let Custom Translator automatically build tuning and test sets
6464
for your model.
6565

articles/cognitive-services/Translator/custom-translator/training-and-model.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -16,51 +16,51 @@ ms.author: swmachan
1616

1717
A model is the system, which provides translation for a specific language pair.
1818
The outcome of a successful training is a model. When training a model, three
19-
mutually exclusive data sets are required: training dataset, tuning dataset, and
20-
testing dataset. Dictionary data can also be provided.
19+
mutually exclusive document types are required: training, tuning, and
20+
testing. Dictionary document type can also be provided. Please refere to [Sentence alignment](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/sentence-alignment#suggested-minimum-number-of-sentences).
2121

22-
If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing datasets. It will exclude 5,000 sentences from your training data and use 2,500 each to assemble a tuning and testing sets.
22+
If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing data. It will use a random subset of sentences from your training documents, and exclude these sentences from the training data itself.
2323

24-
## Training dataset for Custom Translator
24+
## Training document type for Custom Translator
2525

2626
Documents included in training set are used by the Custom Translator as the basis for building your model. During training execution, sentences that are present in these documents are aligned (or paired). You can take liberties in composing your set of training documents. You can include documents that you believe are of tangential relevance in one model. Again exclude them in another to see the impact in [BLEU (Bilingual Evaluation Understudy) score](what-is-bleu-score.md). As long as you keep the tuning set and test set constant, feel free to experiment with the composition of the training set. This approach is an effective way to modify the quality of your translation system.
2727

2828
You can run multiple trainings within a project and compare the [BLEU scores](what-is-bleu-score.md) across all training runs. When you are running multiple trainings for comparison, ensure same tuning/ test data is specified each time. Also make sure to also inspect the results manually in the [“Testing”](how-to-view-system-test-results.md) tab.
2929

30-
## Tuning dataset for Custom Translator
30+
## Tuning document type for Custom Translator
3131

3232
Parallel documents included in this set are used by the Custom Translator to
3333
tune the translation system for optimal results.
3434

35-
The tuning set is used during training to adjust all parameters and weights of
36-
the translation system to the optimal values. Choose your tuning set carefully:
37-
the tuning set should be representative of the content of the documents you
38-
intend to translate in the future. The tuning set has a major influence on the
35+
The tuning data is used during training to adjust all parameters and weights of
36+
the translation system to the optimal values. Choose your tuning data carefully:
37+
the tuning data should be representative of the content of the documents you
38+
intend to translate in the future. The tuning data has a major influence on the
3939
quality of the translations produced. Tuning enables the translation system to
4040
provide translations that are closest to the samples you provide in the tuning
41-
dataset. You do not need more than 2500 sentences as tuning set. For optimal
41+
data. You do not need more than 2500 sentences in your tuning data. For optimal
4242
translation quality, it is recommended to select the tuning set manually by
4343
choosing the most representative selection of sentences.
4444

4545
When creating your tuning set, choose sentences that are a meaningful and
4646
representative length of the future sentences that you expect to translate. You
4747
should also choose sentences that have words and phrases that you intend to
4848
translate in the approximate distribution that you expect in your future
49-
translations. In practice, a sentence length of 8 to 18 words will produce the
49+
translations. In practice, a sentence length of 7 to 10 words will produce the
5050
best results, because these sentences contain enough context to show inflection
5151
and provide a phrase length that is significant, without being overly complex.
5252

5353
A good description of the type of sentences to use in the tuning set is prose:
5454
actual fluent sentences. Not table cells, not poems, not lists of things, not
5555
only punctuation, or numbers in a sentence - regular language.
5656

57-
If you manually select your tuning data set, it should not have any of the same
58-
sentences as your training and testing data. The tuning set has a significant
57+
If you manually select your tuning data, it should not have any of the same
58+
sentences as your training and testing data. The tuning data has a significant
5959
impact on the quality of the translations - choose the sentences carefully.
6060

61-
If you are not sure what to choose for your tuning set, just select the training
62-
set and let Custom Translator select your tuning set for you. When you let the
63-
Custom Translator choose the tuning set automatically, it will use a random
61+
If you are not sure what to choose for your tuning data, just select the training
62+
data and let Custom Translator select the tuning data for you. When you let the
63+
Custom Translator choose the tuning data automatically, it will use a random
6464
subset of sentences from your bilingual training documents and exclude these
6565
sentences from the training material itself.
6666

@@ -78,16 +78,16 @@ ranges from 0 to 100. A score of 0 indicates that not a single word of the
7878
reference appears in the translation. A score of 100 indicates that the
7979
automatic translation exactly matches the reference: the same word is in the
8080
exact same position. The score you receive is the BLEU score average for all
81-
sentences of the testing set.
81+
sentences of the testing data.
8282

83-
The test set should include parallel documents where the target language
83+
The test data should include parallel documents where the target language
8484
sentences are the most desirable translations of the corresponding source
85-
language sentences in the pair. You may want to use the same criteria you used
86-
to compose the tuning set. However, the testing set has no influence over the
85+
language sentences in the the source-target pair. You may want to use the same criteria you used
86+
to compose the tuning data. However, the testing data has no influence over the
8787
quality of the translation system. It is used exclusively to generate the BLEU
88-
score for you, and for nothing else.
88+
score for you.
8989

90-
You don't need more than 2,500 sentences as the testing set. When you let the
90+
You don't need more than 2,500 sentences as the testing data. When you let the
9191
system choose the testing set automatically, it will use a random subset of
9292
sentences from your bilingual training documents, and exclude these sentences
9393
from the training material itself.

articles/cognitive-services/Translator/dynamic-dictionary.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,23 +14,23 @@ ms.author: swmachan
1414

1515
# How to use a dynamic dictionary
1616

17-
If you already know the translation you want to apply to a word or a phrase, you can supply it as markup within the request. The dynamic dictionary is only safe for compound nouns like proper names and product names.
17+
If you already know the translation you want to apply to a word or a phrase, you can supply it as markup within the request. The dynamic dictionary is safe only for compound nouns like proper names and product names.
1818

1919
**Syntax:**
2020

2121
<mstrans:dictionary translation="translation of phrase">phrase</mstrans:dictionary>
2222

2323
**Requirements:**
2424

25-
* The `From` and `To` languages must be different.
26-
* You must include the `From` parameter in your API translation request instead of using the auto-detect feature.
25+
* The `From` and `To` languages must include English and another supported language.
26+
* You must include the `From` parameter in your API translation request instead of using the autodetect feature.
2727

2828
**Example: en-de:**
2929

30-
Source input: The word <mstrans:dictionary translation=\"wordomatic\">word or phrase</mstrans:dictionary> is a dictionary entry.
30+
Source input: `The word <mstrans:dictionary translation=\"wordomatic\">word or phrase</mstrans:dictionary> is a dictionary entry.`
3131

32-
Target output: Das Wort "wordomatic" ist ein Wörterbucheintrag.
32+
Target output: `Das Wort "wordomatic" ist ein Wörterbucheintrag.`
3333

3434
This feature works the same way with and without HTML mode.
3535

36-
The feature should be used sparingly. The appropriate and far better way of customizing translation is by using Custom Translator. Custom Translator makes full use of context and statistical probabilities. If you have or can create training data that shows your work or phrase in context, you get much better results. You can find more information about Custom Translator at [https://aka.ms/CustomTranslator](https://aka.ms/CustomTranslator).
36+
Use the feature sparingly. A better way to customize translation is by using Custom Translator. Custom Translator makes full use of context and statistical probabilities. If you have or can create training data that shows your work or phrase in context, you get much better results. You can find more information about Custom Translator at [https://aka.ms/CustomTranslator](https://aka.ms/CustomTranslator).

0 commit comments

Comments
 (0)