Skip to content

Commit c25d8ea

Browse files
authored
Merge pull request #214178 from laujan/1998424-update-custom-translator-dictionary
update custom translator dictionary
2 parents a845d6d + 1e1cd38 commit c25d8ea

File tree

1 file changed

+24
-7
lines changed

1 file changed

+24
-7
lines changed

articles/cognitive-services/Translator/custom-translator/what-is-dictionary.md

Lines changed: 24 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ author: laujan
66
manager: nitinme
77
ms.service: cognitive-services
88
ms.subservice: translator-text
9-
ms.date: 12/06/2021
9+
ms.date: 10/11/2022
1010
ms.author: lajanuar
1111
ms.topic: conceptual
1212
ms.custom: cogserv-non-critical-translator
@@ -36,12 +36,29 @@ You can train a model using only dictionary data. To do so, select only the dict
3636
3737
## Recommendations
3838

39-
- Dictionaries aren't a substitute for training a model using training data. We recommended letting the system learn from your training data for better results. However, when sentences or compound nouns must be rendered as-is, use a dictionary.
40-
- The phrase dictionary should be used sparingly. When a phrase within a sentence is replaced, the context within that sentence is lost or limited for translating the rest of the sentence. The result is that while the phrase or word within the sentence will translate according to the provided dictionary, the overall translation quality of the sentence will often suffer.
41-
- The phrase dictionary works well for compound nouns like product names ("Microsoft SQL Server"), proper names ("City of Hamburg"), or features of the product ("pivot table"). It doesn't work equally well for verbs or adjectives because those words are typically highly inflected in the source or in the target language. Best practice is to avoid phrase dictionary entries for anything but compound nouns.
42-
- If you're using a phrase dictionary, capitalization and punctuation are important. Dictionary entries will only match words and phrases in the input sentence that use exactly the same capitalization and punctuation as specified in the source dictionary file. Also the translations will reflect the capitalization and punctuation provided in the target dictionary file. For example, if you trained an English to Spanish system that uses a phrase dictionary that specifies "US" in the source file, and "EE.UU." in the target file. When you request translation of a sentence that includes the word "us" (not capitalized), it will NOT return a match from the dictionary. However, if you request translation of a sentence that contains the word "US" (capitalized), it will match the dictionary and the translation will contain "EE.UU." The capitalization and punctuation in the translation may be different than specified in the dictionary target file, and may be different from the capitalization and punctuation in the source. It follows the rules of the target language.
43-
- If you're using a sentence dictionary, the end of sentence punctuation is ignored. For example, if your source dictionary contains "this sentence ends with punctuation!", then any translation requests containing "this sentence ends with punctuation" would match.
44-
- If a word appears more than once in a dictionary file, the system will always use the last entry provided. Thus, your dictionary shouldn't contain multiple translations of the same word.
39+
- Dictionaries aren't a substitute for training a model using training data. For better results, we recommended letting the system learn from your training data. However, when sentences or compound nouns must be translated verbatim, use a dictionary.
40+
41+
- The phrase dictionary should be used sparingly. When a phrase within a sentence is replaced, the context of that sentence is lost or limited for translating the rest of the sentence. The result is that, while the phrase or word within the sentence will translate according to the provided dictionary, the overall translation quality of the sentence often suffers.
42+
43+
- The phrase dictionary works well for compound nouns like product names ("_Microsoft SQL Server_"), proper names ("_City of Hamburg_"), or product features ("_pivot table_"). It doesn't work as well for verbs or adjectives because those words are typically highly contextual within the source or target language. The best practice is to avoid phrase dictionary entries for anything but compound nouns.
44+
45+
- If you're using a phrase dictionary, capitalization and punctuation are important. Dictionary entries are case- and punctuation-sensitive. Custom Translator will only match words and phrases in the input sentence that use exactly the same capitalization and punctuation marks as specified in the source dictionary file. Also, translations will reflect the capitalization and punctuation provided in the target dictionary file.
46+
47+
**Example**
48+
49+
- If you're training an English-to-Spanish system that uses a phrase dictionary and you specify "_SQL server_" in the source file and "_Microsoft SQL Server_" in the target file. When you request the translation of a sentence that contains the phrase "_SQL server_", Custom Translator will match the dictionary entry and the translation will contain "_Microsoft SQL Server_."
50+
- When you request translation of a sentence that includes the same phrase but **doesn't** match what is in your source file, such as "_sql server_", "_sql Server_" or "_SQL Server_", it **won't** return a match from your dictionary.
51+
- The translation follows the rules of the target language as specified in your phrase dictionary.
52+
53+
- If you're using a sentence dictionary, end-of-sentence punctuation is ignored.
54+
55+
**Example**
56+
57+
- If your source dictionary contains "_This sentence ends with punctuation!_", then any translation requests containing "_This sentence ends with punctuation_" will match.
58+
59+
- Your dictionary should contain unique source lines. If a source line (a word, phrase, or sentence) appears more than once in a dictionary file, the system will always use the **last entry** provided and return the target when a match is found.
60+
61+
- Avoid adding phrases that consist of only numbers or are two- or three-letter words, such as acronyms, in the source dictionary file.
4562

4663
## Next steps
4764

0 commit comments

Comments
 (0)