Merge pull request #97695 from melghazali/master

ktoliver · web-flow · commit d993b8893211 · 2019-12-10T13:39:27.000-07:00
[Cog Svcs] Correcting dynamic dictionary support note
diff --git a/articles/cognitive-services/Translator/custom-translator/how-to-migrate.md b/articles/cognitive-services/Translator/custom-translator/how-to-migrate.md
@@ -23,7 +23,7 @@ These actions are performed during migration:
 * At any point, the BLEU score migrated from the Hub training can be found in the TrainingDetails page of the model in the “Bleu score in MT Hub” heading.
 
 > [!Note] 
-> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](sentence-alignment.md#suggested-minimum-number-of-sentences).
+> For a training to succeed, Custom Translator requires a minimum of 10,000 unique extracted sentences. Custom Translator can't conduct a training with fewer than the [suggested minimum](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/sentence-alignment#suggested-minimum-number-of-sentences).
 
 ## Find Custom Translator Workspace ID
 
diff --git a/articles/cognitive-services/Translator/custom-translator/quickstart-build-deploy-custom-model.md b/articles/cognitive-services/Translator/custom-translator/quickstart-build-deploy-custom-model.md
@@ -40,7 +40,7 @@ your project. For more details, visit [Create Project](how-to-create-project.md)
 
 ## Upload documents
 
-Next, upload [training](training-and-model.md#training-dataset-for-custom-translator), [tuning](training-and-model.md#tuning-dataset-for-custom-translator) and [testing](training-and-model.md#testing-dataset-for-custom-translator) document sets. You can upload both [parallel](what-are-parallel-documents.md) and combo documents. You can also upload [dictionary](what-is-dictionary.md).
+Next, upload [training](training-and-model.md#training-document-type-for-custom-translator), [tuning](training-and-model.md#tuning-document-type-for-custom-translator) and [testing](training-and-model.md#testing-dataset-for-custom-translator) document sets. You can upload both [parallel](what-are-parallel-documents.md) and combo documents. You can also upload [dictionary](what-is-dictionary.md).
 
 You can upload documents from either the documents tab or from a specific
 project's page.
@@ -58,8 +58,8 @@ model.
 
 Select the project you've created. You'll see all the documents you've uploaded
 that share a language pair with this project. Select the documents that you want
-included in your model. You can select [training](training-and-model.md#training-dataset-for-custom-translator),
-[tuning](training-and-model.md#tuning-dataset-for-custom-translator), and [testing](training-and-model.md#testing-dataset-for-custom-translator) data or select just
+included in your model. You can select [training](training-and-model.md#training-document-type-for-custom-translator),
+[tuning](training-and-model.md#tuning-document-type-for-custom-translator), and [testing](training-and-model.md#testing-dataset-for-custom-translator) data or select just
 training data and let Custom Translator automatically build tuning and test sets
 for your model.
 
diff --git a/articles/cognitive-services/Translator/custom-translator/training-and-model.md b/articles/cognitive-services/Translator/custom-translator/training-and-model.md
@@ -16,51 +16,51 @@ ms.author: swmachan
 
 A model is the system, which provides translation for a specific language pair.
 The outcome of a successful training is a model. When training a model, three
-mutually exclusive data sets are required: training dataset, tuning dataset, and
-testing dataset. Dictionary data can also be provided.
+mutually exclusive document types are required: training, tuning, and
+testing. Dictionary document type can also be provided. Please refere to [Sentence alignment](https://docs.microsoft.com/azure/cognitive-services/translator/custom-translator/sentence-alignment#suggested-minimum-number-of-sentences).
 
-If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing datasets. It will exclude 5,000 sentences from your training data and use 2,500 each to assemble a tuning and testing sets.
+If only training data is provided when queuing a training, Custom Translator will automatically assemble tuning and testing data. It will use a random subset of sentences from your training documents, and exclude these sentences from the training data itself.
 
-## Training dataset for Custom Translator
+## Training document type for Custom Translator
 
 Documents included in training set are used by the Custom Translator as the basis for building your model. During training execution, sentences that are present in these documents are aligned (or paired). You can take liberties in composing your set of training documents. You can include documents that you believe are of tangential relevance in one model. Again exclude them in another to see the impact in [BLEU (Bilingual Evaluation Understudy) score](what-is-bleu-score.md). As long as you keep the tuning set and test set constant, feel free to experiment with the composition of the training set. This approach  is an effective way to modify the quality of your translation system.
 
 You can run multiple trainings within a project and compare the [BLEU scores](what-is-bleu-score.md) across all training runs. When you are running multiple trainings for comparison, ensure same tuning/ test data is specified each time. Also make sure to also inspect the results manually in the [“Testing”](how-to-view-system-test-results.md) tab.
 
-## Tuning dataset for Custom Translator
+## Tuning document type for Custom Translator
 
 Parallel documents included in this set are used by the Custom Translator to
 tune the translation system for optimal results.
 
-The tuning set is used during training to adjust all parameters and weights of
-the translation system to the optimal values. Choose your tuning set carefully:
-the tuning set should be representative of the content of the documents you
-intend to translate in the future. The tuning set has a major influence on the
+The tuning data is used during training to adjust all parameters and weights of
+the translation system to the optimal values. Choose your tuning data carefully:
+the tuning data should be representative of the content of the documents you
+intend to translate in the future. The tuning data has a major influence on the
 quality of the translations produced. Tuning enables the translation system to
 provide translations that are closest to the samples you provide in the tuning
-dataset. You do not need more than 2500 sentences as tuning set. For optimal
+data. You do not need more than 2500 sentences in your tuning data. For optimal
 translation quality, it is recommended to select the tuning set manually by
 choosing the most representative selection of sentences.
 
 When creating your tuning set, choose sentences that are a meaningful and
 representative length of the future sentences that you expect to translate. You
 should also choose sentences that have words and phrases that you intend to
 translate in the approximate distribution that you expect in your future
-translations. In practice, a sentence length of 8 to 18 words will produce the
+translations. In practice, a sentence length of 7 to 10 words will produce the
 best results, because these sentences contain enough context to show inflection
 and provide a phrase length that is significant, without being overly complex.
 
 A good description of the type of sentences to use in the tuning set is prose:
 actual fluent sentences. Not table cells, not poems, not lists of things, not
 only punctuation, or numbers in a sentence - regular language.
 
-If you manually select your tuning data set, it should not have any of the same
-sentences as your training and testing data. The tuning set has a significant
+If you manually select your tuning data, it should not have any of the same
+sentences as your training and testing data. The tuning data has a significant
 impact on the quality of the translations - choose the sentences carefully.
 
-If you are not sure what to choose for your tuning set, just select the training
-set and let Custom Translator select your tuning set for you. When you let the
-Custom Translator choose the tuning set automatically, it will use a random
+If you are not sure what to choose for your tuning data, just select the training
+data and let Custom Translator select the tuning data for you. When you let the
+Custom Translator choose the tuning data automatically, it will use a random
 subset of sentences from your bilingual training documents and exclude these
 sentences from the training material itself.
 
@@ -78,16 +78,16 @@ ranges from 0 to 100. A score of 0 indicates that not a single word of the
 reference appears in the translation. A score of 100 indicates that the
 automatic translation exactly matches the reference: the same word is in the
 exact same position. The score you receive is the BLEU score average for all
-sentences of the testing set.
+sentences of the testing data.
 
-The test set should include parallel documents where the target language
+The test data should include parallel documents where the target language
 sentences are the most desirable translations of the corresponding source
-language sentences in the pair. You may want to use the same criteria you used
-to compose the tuning set. However, the testing set has no influence over the
+language sentences in the the source-target pair. You may want to use the same criteria you used
+to compose the tuning data. However, the testing data has no influence over the
 quality of the translation system. It is used exclusively to generate the BLEU
-score for you, and for nothing else.
+score for you.
 
-You don't need more than 2,500 sentences as the testing set. When you let the
+You don't need more than 2,500 sentences as the testing data. When you let the
 system choose the testing set automatically, it will use a random subset of
 sentences from your bilingual training documents, and exclude these sentences
 from the training material itself.
diff --git a/articles/cognitive-services/Translator/dynamic-dictionary.md b/articles/cognitive-services/Translator/dynamic-dictionary.md
@@ -14,23 +14,23 @@ ms.author: swmachan
 
 # How to use a dynamic dictionary
 
-If you already know the translation you want to apply to a word or a phrase, you can supply it as markup within the request. The dynamic dictionary is only safe for compound nouns like proper names and product names.
+If you already know the translation you want to apply to a word or a phrase, you can supply it as markup within the request. The dynamic dictionary is safe only for compound nouns like proper names and product names.
 
 **Syntax:**
 
 <mstrans:dictionary translation="translation of phrase">phrase</mstrans:dictionary>
 
 **Requirements:**
 
-* The `From` and `To` languages must be different. 
-* You must include the `From` parameter in your API translation request instead of using the auto-detect feature. 
+* The `From` and `To` languages must include English and another supported language. 
+* You must include the `From` parameter in your API translation request instead of using the autodetect feature. 
 
 **Example: en-de:**
 
-Source input: The word <mstrans:dictionary translation=\"wordomatic\">word or phrase</mstrans:dictionary> is a dictionary entry.
+Source input: `The word <mstrans:dictionary translation=\"wordomatic\">word or phrase</mstrans:dictionary> is a dictionary entry.`
 
-Target output: Das Wort "wordomatic" ist ein Wörterbucheintrag.
+Target output: `Das Wort "wordomatic" ist ein Wörterbucheintrag.`
 
 This feature works the same way with and without HTML mode.
 
-The feature should be used sparingly. The appropriate and far better way of customizing translation is by using Custom Translator. Custom Translator makes full use of context and statistical probabilities. If you have or can create training data that shows your work or phrase in context, you get much better results. You can find more information about Custom Translator at [https://aka.ms/CustomTranslator](https://aka.ms/CustomTranslator).
+Use the feature sparingly. A better way to customize translation is by using Custom Translator. Custom Translator makes full use of context and statistical probabilities. If you have or can create training data that shows your work or phrase in context, you get much better results. You can find more information about Custom Translator at [https://aka.ms/CustomTranslator](https://aka.ms/CustomTranslator).