You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Translator/custom-translator/how-to-migrate.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ These actions are performed during migration:
23
23
* At any point, the BLEU score migrated from the Hub training can be found in the TrainingDetails page of the model in the “Bleu score in MT Hub” heading.
24
24
25
25
> [!Note]
26
-
> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](sentence-alignment.md#suggested-minimum-number-of-sentences).
26
+
> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/sentence-alignment#suggested-minimum-number-of-sentences).
Copy file name to clipboardExpand all lines: articles/cognitive-services/Translator/custom-translator/quickstart-build-deploy-custom-model.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,7 @@ your project. For more details, visit [Create Project](how-to-create-project.md)
40
40
41
41
## Upload documents
42
42
43
-
Next, upload [training](training-and-model.md#training-dataset-for-custom-translator), [tuning](training-and-model.md#tuning-dataset-for-custom-translator) and [testing](training-and-model.md#testing-dataset-for-custom-translator) document sets. You can upload both [parallel](what-are-parallel-documents.md) and combo documents. You can also upload [dictionary](what-is-dictionary.md).
43
+
Next, upload [training](training-and-model.md#training-document-type-for-custom-translator), [tuning](training-and-model.md#tuning-document-type-for-custom-translator) and [testing](training-and-model.md#testing-dataset-for-custom-translator) document sets. You can upload both [parallel](what-are-parallel-documents.md) and combo documents. You can also upload [dictionary](what-is-dictionary.md).
44
44
45
45
You can upload documents from either the documents tab or from a specific
46
46
project's page.
@@ -58,8 +58,8 @@ model.
58
58
59
59
Select the project you've created. You'll see all the documents you've uploaded
60
60
that share a language pair with this project. Select the documents that you want
61
-
included in your model. You can select [training](training-and-model.md#training-dataset-for-custom-translator),
62
-
[tuning](training-and-model.md#tuning-dataset-for-custom-translator), and [testing](training-and-model.md#testing-dataset-for-custom-translator) data or select just
61
+
included in your model. You can select [training](training-and-model.md#training-document-type-for-custom-translator),
62
+
[tuning](training-and-model.md#tuning-document-type-for-custom-translator), and [testing](training-and-model.md#testing-dataset-for-custom-translator) data or select just
63
63
training data and let Custom Translator automatically build tuning and test sets
Copy file name to clipboardExpand all lines: articles/cognitive-services/Translator/custom-translator/training-and-model.md
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,51 +16,51 @@ ms.author: swmachan
16
16
17
17
A model is the system, which provides translation for a specific language pair.
18
18
The outcome of a successful training is a model. When training a model, three
19
-
mutually exclusive data sets are required: training dataset, tuning dataset, and
20
-
testing dataset. Dictionary data can also be provided.
19
+
mutually exclusive document types are required: training, tuning, and
20
+
testing. Dictionary document type can also be provided. Please refere to [Sentence alignment](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/sentence-alignment#suggested-minimum-number-of-sentences).
21
21
22
-
If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing datasets. It will exclude 5,000 sentences from your training data and use 2,500 each to assemble a tuning and testing sets.
22
+
If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing data. It will use a random subset of sentences from your training documents, and exclude these sentences from the training data itself.
23
23
24
-
## Training dataset for Custom Translator
24
+
## Training document type for Custom Translator
25
25
26
26
Documents included in training set are used by the Custom Translator as the basis for building your model. During training execution, sentences that are present in these documents are aligned (or paired). You can take liberties in composing your set of training documents. You can include documents that you believe are of tangential relevance in one model. Again exclude them in another to see the impact in [BLEU (Bilingual Evaluation Understudy) score](what-is-bleu-score.md). As long as you keep the tuning set and test set constant, feel free to experiment with the composition of the training set. This approach is an effective way to modify the quality of your translation system.
27
27
28
28
You can run multiple trainings within a project and compare the [BLEU scores](what-is-bleu-score.md) across all training runs. When you are running multiple trainings for comparison, ensure same tuning/ test data is specified each time. Also make sure to also inspect the results manually in the [“Testing”](how-to-view-system-test-results.md) tab.
29
29
30
-
## Tuning dataset for Custom Translator
30
+
## Tuning document type for Custom Translator
31
31
32
32
Parallel documents included in this set are used by the Custom Translator to
33
33
tune the translation system for optimal results.
34
34
35
-
The tuning set is used during training to adjust all parameters and weights of
36
-
the translation system to the optimal values. Choose your tuning set carefully:
37
-
the tuning set should be representative of the content of the documents you
38
-
intend to translate in the future. The tuning set has a major influence on the
35
+
The tuning data is used during training to adjust all parameters and weights of
36
+
the translation system to the optimal values. Choose your tuning data carefully:
37
+
the tuning data should be representative of the content of the documents you
38
+
intend to translate in the future. The tuning data has a major influence on the
39
39
quality of the translations produced. Tuning enables the translation system to
40
40
provide translations that are closest to the samples you provide in the tuning
41
-
dataset. You do not need more than 2500 sentences as tuning set. For optimal
41
+
data. You do not need more than 2500 sentences in your tuning data. For optimal
42
42
translation quality, it is recommended to select the tuning set manually by
43
43
choosing the most representative selection of sentences.
44
44
45
45
When creating your tuning set, choose sentences that are a meaningful and
46
46
representative length of the future sentences that you expect to translate. You
47
47
should also choose sentences that have words and phrases that you intend to
48
48
translate in the approximate distribution that you expect in your future
49
-
translations. In practice, a sentence length of 8 to 18 words will produce the
49
+
translations. In practice, a sentence length of 7 to 10 words will produce the
50
50
best results, because these sentences contain enough context to show inflection
51
51
and provide a phrase length that is significant, without being overly complex.
52
52
53
53
A good description of the type of sentences to use in the tuning set is prose:
54
54
actual fluent sentences. Not table cells, not poems, not lists of things, not
55
55
only punctuation, or numbers in a sentence - regular language.
56
56
57
-
If you manually select your tuning data set, it should not have any of the same
58
-
sentences as your training and testing data. The tuning set has a significant
57
+
If you manually select your tuning data, it should not have any of the same
58
+
sentences as your training and testing data. The tuning data has a significant
59
59
impact on the quality of the translations - choose the sentences carefully.
60
60
61
-
If you are not sure what to choose for your tuning set, just select the training
62
-
set and let Custom Translator select your tuning set for you. When you let the
63
-
Custom Translator choose the tuning set automatically, it will use a random
61
+
If you are not sure what to choose for your tuning data, just select the training
62
+
data and let Custom Translator select the tuning data for you. When you let the
63
+
Custom Translator choose the tuning data automatically, it will use a random
64
64
subset of sentences from your bilingual training documents and exclude these
65
65
sentences from the training material itself.
66
66
@@ -78,16 +78,16 @@ ranges from 0 to 100. A score of 0 indicates that not a single word of the
78
78
reference appears in the translation. A score of 100 indicates that the
79
79
automatic translation exactly matches the reference: the same word is in the
80
80
exact same position. The score you receive is the BLEU score average for all
81
-
sentences of the testing set.
81
+
sentences of the testing data.
82
82
83
-
The test set should include parallel documents where the target language
83
+
The test data should include parallel documents where the target language
84
84
sentences are the most desirable translations of the corresponding source
85
-
language sentences in the pair. You may want to use the same criteria you used
86
-
to compose the tuning set. However, the testing set has no influence over the
85
+
language sentences in the the source-target pair. You may want to use the same criteria you used
86
+
to compose the tuning data. However, the testing data has no influence over the
87
87
quality of the translation system. It is used exclusively to generate the BLEU
88
-
score for you, and for nothing else.
88
+
score for you.
89
89
90
-
You don't need more than 2,500 sentences as the testing set. When you let the
90
+
You don't need more than 2,500 sentences as the testing data. When you let the
91
91
system choose the testing set automatically, it will use a random subset of
92
92
sentences from your bilingual training documents, and exclude these sentences
Copy file name to clipboardExpand all lines: articles/cognitive-services/Translator/dynamic-dictionary.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,23 +14,23 @@ ms.author: swmachan
14
14
15
15
# How to use a dynamic dictionary
16
16
17
-
If you already know the translation you want to apply to a word or a phrase, you can supply it as markup within the request. The dynamic dictionary is only safe for compound nouns like proper names and product names.
17
+
If you already know the translation you want to apply to a word or a phrase, you can supply it as markup within the request. The dynamic dictionary is safe only for compound nouns like proper names and product names.
18
18
19
19
**Syntax:**
20
20
21
21
<mstrans:dictionary translation="translation of phrase">phrase</mstrans:dictionary>
22
22
23
23
**Requirements:**
24
24
25
-
* The `From` and `To` languages must be different.
26
-
* You must include the `From` parameter in your API translation request instead of using the auto-detect feature.
25
+
* The `From` and `To` languages must include English and another supported language.
26
+
* You must include the `From` parameter in your API translation request instead of using the autodetect feature.
27
27
28
28
**Example: en-de:**
29
29
30
-
Source input: The word <mstrans:dictionary translation=\"wordomatic\">word or phrase</mstrans:dictionary> is a dictionary entry.
30
+
Source input: `The word <mstrans:dictionary translation=\"wordomatic\">word or phrase</mstrans:dictionary> is a dictionary entry.`
31
31
32
-
Target output: Das Wort "wordomatic" ist ein Wörterbucheintrag.
32
+
Target output: `Das Wort "wordomatic" ist ein Wörterbucheintrag.`
33
33
34
34
This feature works the same way with and without HTML mode.
35
35
36
-
The feature should be used sparingly. The appropriate and far better way of customizing translation is by using Custom Translator. Custom Translator makes full use of context and statistical probabilities. If you have or can create training data that shows your work or phrase in context, you get much better results. You can find more information about Custom Translator at [https://aka.ms/CustomTranslator](https://aka.ms/CustomTranslator).
36
+
Use the feature sparingly. A better way to customize translation is by using Custom Translator. Custom Translator makes full use of context and statistical probabilities. If you have or can create training data that shows your work or phrase in context, you get much better results. You can find more information about Custom Translator at [https://aka.ms/CustomTranslator](https://aka.ms/CustomTranslator).
0 commit comments