You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Translator/custom-translator/what-is-dictionary.md
+24-7Lines changed: 24 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ author: laujan
6
6
manager: nitinme
7
7
ms.service: cognitive-services
8
8
ms.subservice: translator-text
9
-
ms.date: 12/06/2021
9
+
ms.date: 10/11/2022
10
10
ms.author: lajanuar
11
11
ms.topic: conceptual
12
12
ms.custom: cogserv-non-critical-translator
@@ -36,12 +36,29 @@ You can train a model using only dictionary data. To do so, select only the dict
36
36
37
37
## Recommendations
38
38
39
-
- Dictionaries aren't a substitute for training a model using training data. We recommended letting the system learn from your training data for better results. However, when sentences or compound nouns must be rendered as-is, use a dictionary.
40
-
- The phrase dictionary should be used sparingly. When a phrase within a sentence is replaced, the context within that sentence is lost or limited for translating the rest of the sentence. The result is that while the phrase or word within the sentence will translate according to the provided dictionary, the overall translation quality of the sentence will often suffer.
41
-
- The phrase dictionary works well for compound nouns like product names ("Microsoft SQL Server"), proper names ("City of Hamburg"), or features of the product ("pivot table"). It doesn't work equally well for verbs or adjectives because those words are typically highly inflected in the source or in the target language. Best practice is to avoid phrase dictionary entries for anything but compound nouns.
42
-
- If you're using a phrase dictionary, capitalization and punctuation are important. Dictionary entries will only match words and phrases in the input sentence that use exactly the same capitalization and punctuation as specified in the source dictionary file. Also the translations will reflect the capitalization and punctuation provided in the target dictionary file. For example, if you trained an English to Spanish system that uses a phrase dictionary that specifies "US" in the source file, and "EE.UU." in the target file. When you request translation of a sentence that includes the word "us" (not capitalized), it will NOT return a match from the dictionary. However, if you request translation of a sentence that contains the word "US" (capitalized), it will match the dictionary and the translation will contain "EE.UU." The capitalization and punctuation in the translation may be different than specified in the dictionary target file, and may be different from the capitalization and punctuation in the source. It follows the rules of the target language.
43
-
- If you're using a sentence dictionary, the end of sentence punctuation is ignored. For example, if your source dictionary contains "this sentence ends with punctuation!", then any translation requests containing "this sentence ends with punctuation" would match.
44
-
- If a word appears more than once in a dictionary file, the system will always use the last entry provided. Thus, your dictionary shouldn't contain multiple translations of the same word.
39
+
- Dictionaries aren't a substitute for training a model using training data. For better results, we recommended letting the system learn from your training data. However, when sentences or compound nouns must be translated verbatim, use a dictionary.
40
+
41
+
- The phrase dictionary should be used sparingly. When a phrase within a sentence is replaced, the context of that sentence is lost or limited for translating the rest of the sentence. The result is that, while the phrase or word within the sentence will translate according to the provided dictionary, the overall translation quality of the sentence often suffers.
42
+
43
+
- The phrase dictionary works well for compound nouns like product names ("_Microsoft SQL Server_"), proper names ("_City of Hamburg_"), or product features ("_pivot table_"). It doesn't work as well for verbs or adjectives because those words are typically highly contextual within the source or target language. The best practice is to avoid phrase dictionary entries for anything but compound nouns.
44
+
45
+
- If you're using a phrase dictionary, capitalization and punctuation are important. Dictionary entries are case- and punctuation-sensitive. Custom Translator will only match words and phrases in the input sentence that use exactly the same capitalization and punctuation marks as specified in the source dictionary file. Also, translations will reflect the capitalization and punctuation provided in the target dictionary file.
46
+
47
+
**Example**
48
+
49
+
- If you're training an English-to-Spanish system that uses a phrase dictionary and you specify "_SQL server_" in the source file and "_Microsoft SQL Server_" in the target file. When you request the translation of a sentence that contains the phrase "_SQL server_", Custom Translator will match the dictionary entry and the translation will contain "_Microsoft SQL Server_."
50
+
- When you request translation of a sentence that includes the same phrase but **doesn't** match what is in your source file, such as "_sql server_", "_sql Server_" or "_SQL Server_", it **won't** return a match from your dictionary.
51
+
- The translation follows the rules of the target language as specified in your phrase dictionary.
52
+
53
+
- If you're using a sentence dictionary, end-of-sentence punctuation is ignored.
54
+
55
+
**Example**
56
+
57
+
- If your source dictionary contains "_This sentence ends with punctuation!_", then any translation requests containing "_This sentence ends with punctuation_" will match.
58
+
59
+
- Your dictionary should contain unique source lines. If a source line (a word, phrase, or sentence) appears more than once in a dictionary file, the system will always use the **last entry** provided and return the target when a match is found.
60
+
61
+
- Avoid adding phrases that consist of only numbers or are two- or three-letter words, such as acronyms, in the source dictionary file.
0 commit comments