You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|</br>[Immersive Reader](./immersive-reader/language-support.md)| Help users read and comprehend text. |
27
27
|</br>[Language service](./language-service/concepts/language-support.md)| Build apps with industry-leading natural language understanding capabilities. |
28
-
|</br>[Language Understanding (LUIS)](./luis/luis-language-support.md) (retired) | Understand natural language in your apps. |
28
+
|</br>[Language Understanding (`LUIS`)](./luis/luis-language-support.md) (retired) | Understand natural language in your apps. |
29
29
|</br>[QnA Maker](./qnamaker/overview/language-support.md) (retired) | Distill information into easy-to-navigate questions and answers. |
|</br>[Translator](./translator/language-support.md)| Translate more than 100 in-use, at-risk, and endangered languages and dialects.|
@@ -46,4 +46,4 @@ These Azure AI services are language agnostic and don't have limitations based o
46
46
## See also
47
47
48
48
*[What are Azure AI services?](./what-are-ai-services.md)
49
-
*[Create an account](multi-service-resource.md?pivots=azportal)
49
+
*[How to create an account](multi-service-resource.md?pivots=azportal)
Copy file name to clipboardExpand all lines: articles/ai-services/translator/custom-translator/beginners-guide.md
+20-20Lines changed: 20 additions & 20 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,20 +6,20 @@ author: laujan
6
6
manager: nitinme
7
7
ms.service: azure-ai-translator
8
8
ms.author: lajanuar
9
-
ms.date: 07/18/2023
9
+
ms.date: 07/08/2024
10
10
ms.topic: overview
11
11
---
12
-
# Custom Translator for beginners
12
+
# Custom Translator for beginners
13
13
14
14
[Custom Translator](overview.md) enables you to a build translation system that reflects your business, industry, and domain-specific terminology and style. Training and deploying a custom system is easy and doesn't require any programming skills. The customized translation system seamlessly integrates into your existing applications, workflows, and websites and is available on Azure through the same cloud-based [Microsoft Text Translation API](../reference/v3-0-translate.md?tabs=curl) service that powers billions of translations every day.
15
15
16
-
The platform enables users to build and publish custom translation systems to and from English. The Custom Translator supports more than 60 languages that map directly to the languages available for NMT. For a complete list, *see*[Translator language support](../language-support.md).
16
+
The platform enables users to build and publish custom translation systems to and from English. The Custom Translator supports more than 60 languages that map directly to the languages available for Neural machine translation (NMT). For a complete list, *see*[Translator language support](../language-support.md).
17
17
18
18
## Is a custom translation model the right choice for me?
19
19
20
20
A well-trained custom translation model provides more accurate domain-specific translations because it relies on previously translated in-domain documents to learn preferred translations. Translator uses these terms and phrases in context to produce fluent translations in the target language while respecting context-dependent grammar.
21
21
22
-
Training a full custom translation model requires a substantial amount of data. If you don't have at least 10,000 sentences of previously trained documents, you won't be able to train a full-language translation model. However, you can either train a dictionary-only model or use the high-quality, out-of-the-box translations available with the Text Translation API.
22
+
Training a full custom translation model requires a substantial amount of data. If you don't have at least 10,000 sentences of previously trained documents, you can't train a full-language translation model. However, you can either train a dictionary-only model or use the high-quality, out-of-the-box translations available with the Text Translation API.
23
23
24
24
:::image type="content" source="media/how-to/for-beginners.png" alt-text="Screenshot illustrating the difference between custom and general models.":::
25
25
@@ -31,15 +31,15 @@ Building a custom translation model requires:
31
31
32
32
* Obtaining in-domain translated data (preferably human translated).
33
33
34
-
*The ability to assess translation quality or target language translations.
34
+
*Assessing translation quality or target language translations.
35
35
36
36
## How do I evaluate my use-case?
37
37
38
38
Having clarity on your use-case and what success looks like is the first step towards sourcing proficient training data. Here are a few considerations:
39
39
40
-
*What is your desired outcome and how will you measure it?
40
+
*Is your desired outcome specified and how is it measured?
41
41
42
-
*What is your business domain?
42
+
*Is your business domain identified?
43
43
44
44
* Do you have in-domain sentences of similar terminology and style?
45
45
@@ -53,7 +53,7 @@ Having clarity on your use-case and what success looks like is the first step to
53
53
54
54
Finding in-domain quality data is often a challenging task that varies based on user classification. Here are some questions you can ask yourself as you evaluate what data may be available to you:
55
55
56
-
* Enterprises often have a wealth of translation data that has accumulated over many years of using human translation. Does your company have previous translation data available that you can use?
56
+
*Does your company have previous translation data available that you can use? Enterprises often have a wealth of translation data accumulated over many years of using human translation.
57
57
58
58
* Do you have a vast amount of monolingual data? Monolingual data is data in only one language. If so, can you get translations for this data?
59
59
@@ -67,19 +67,19 @@ Finding in-domain quality data is often a challenging task that varies based on
67
67
| Tuning documents | Trains the Neural Machine Translation parameters. |**Be strict**. Compose them to be optimally representative of what you are going to translate in the future. |
68
68
| Test documents | Calculate the [BLEU score](concepts/bleu-score.md?WT.mc_id=aiml-43548-heboelma).|**Be strict**. Compose test documents to be optimally representative of what you plan to translate in the future. |
69
69
| Phrase dictionary | Forces the given translation 100% of the time. |**Be restrictive**. A phrase dictionary is case-sensitive and any word or phrase listed is translated in the way you specify. In many cases, it's better to not use a phrase dictionary and let the system learn. |
70
-
| Sentence dictionary | Forces the given translation 100% of the time. |**Be strict**. A sentence dictionary is case-insensitive and good for common in domain short sentences. For a sentence dictionary match to occur, the entire submitted sentence must match the source dictionary entry. If only a portion of the sentence matches, the entry won't match. |
70
+
| Sentence dictionary | Forces the given translation 100% of the time. |**Be strict**. A sentence dictionary is case-insensitive and good for common in domain short sentences. For a sentence dictionary match to occur, the entire submitted sentence must match the source dictionary entry. If only a portion of the sentence matches, the entry doesn't match. |
71
71
72
72
## What is a BLEU score?
73
73
74
-
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the precision or accuracy of text that has been machine translated from one language to another. Custom Translator uses the BLEU metric as one way of conveying translation accuracy.
74
+
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the precision or accuracy of text that is machine translated from one language to another. Custom Translator uses the BLEU metric as one way of conveying translation accuracy.
75
75
76
76
A BLEU score is a number between zero and 100. A score of zero indicates a low quality translation where nothing in the translation matched the reference. A score of 100 indicates a perfect translation that is identical to the reference. It's not necessary to attain a score of 100 - a BLEU score between 40 and 60 indicates a high-quality translation.
## What happens if I don't submit tuning or testing data?
81
81
82
-
Tuning and test sentences are optimally representative of what you plan to translate in the future. If you don't submit any tuning or testing data, Custom Translator will automatically exclude sentences from your training documents to use as tuning and test data.
82
+
Tuning and test sentences are optimally representative of what you plan to translate in the future. If you don't submit any tuning or testing data, Custom Translator automatically excludes sentences from your training documents to use as tuning and test data.
83
83
84
84
| System-generated | Manual-selection |
85
85
|---|---|
@@ -90,24 +90,24 @@ Tuning and test sentences are optimally representative of what you plan to trans
90
90
91
91
## How is training material processed by Custom Translator?
92
92
93
-
To prepare for training, documents undergo a series of processing and filtering steps. These steps are explained below. Knowledge of the filtering process may help with understanding the sentence count displayed as well as the steps you can take to prepare training documents for training with Custom Translator.
93
+
To prepare for training, documents undergo a series of processing and filtering steps. Knowledge of the filtering process may help with understanding the sentence count displayed as well as the steps you can take to prepare training documents for training with Custom Translator. The filtering steps are as follows:
94
94
95
95
*### Sentence alignment
96
96
97
-
If your document isn't in XLIFF, XLSX, TMX, or ALIGN format, Custom Translator aligns the sentences of your source and target documents to each other, sentence-by-sentence. Translator doesn't perform document alignment—it follows your naming convention for the documents to find a matching document in the other language. Within the source text, Custom Translator tries to find the corresponding sentence in the target language. It uses document markup like embedded HTML tags to help with the alignment.
97
+
If your document isn't in `XLIFF`, `XLSX`, `TMX`, or `ALIGN` format, Custom Translator aligns the sentences of your source and target documents to each other, sentence-by-sentence. Translator doesn't perform document alignment—it follows your naming convention for the documents to find a matching document in the other language. Within the source text, Custom Translator tries to find the corresponding sentence in the target language. It uses document markup like embedded HTML tags to help with the alignment.
98
98
99
99
If you see a large discrepancy between the number of sentences in the source and target documents, your source document may not be parallel, or couldn't be aligned. The document pairs with a large difference (>10%) of sentences on each side warrant a second look to make sure they're indeed parallel.
100
100
101
-
*### Extracting tuning and testing data
101
+
*### Tuning and testing data extraction
102
102
103
-
Tuning and testing data is optional. If you don't provide it, the system will remove an appropriate percentage from your training documents to use for tuning and testing. The removal happens dynamically as part of the training process. Since this step occurs as part of training, your uploaded documents aren't affected. You can see the final used sentence counts for each category of data—training, tuning, testing, and dictionary—on the Model details page after training has succeeded.
103
+
Tuning and testing data is optional. If you don't provide it, the system removes an appropriate percentage from your training documents to use for tuning and testing. The removal happens dynamically as part of the training process. Since this step occurs as part of training, your uploaded documents aren't affected. You can see the final used sentence counts for each category of data—training, tuning, testing, and dictionary—on the Model details page after training succeeds.
104
104
105
105
*### Length filter
106
106
107
107
* Removes sentences with only one word on either side.
108
108
* Removes sentences with more than 100 words on either side. Chinese, Japanese, Korean are exempt.
109
109
* Removes sentences with fewer than three characters. Chinese, Japanese, Korean are exempt.
110
-
* Removes sentences with more than 2000 characters for Chinese, Japanese, Korean.
110
+
* Removes sentences with more than 2,000 characters for Chinese, Japanese, Korean.
111
111
* Removes sentences with less than 1% alphanumeric characters.
112
112
* Removes dictionary entries containing more than 50 words.
113
113
@@ -140,15 +140,15 @@ To prepare for training, documents undergo a series of processing and filtering
140
140
141
141
* Remove sentences with invalid encoding.
142
142
* Remove Unicode control characters.
143
-
*If feasible, align sentences (source-to-target).
143
+
*Align sentences (source-to-target), if feasible.
144
144
* Remove source and target sentences that don't match the source and target languages.
145
145
* When source and target sentences have mixed languages, ensure that untranslated words are intentional, for example, names of organizations and products.
146
-
*Correct grammatical and typographical errors to prevent teaching these errors to your model.
147
-
*Though our training process handles source and target lines containing multiple sentences, it's better to have one source sentence mapped to one target sentence.
146
+
*Avoid teaching errors to your model by making certain that grammar and typography are correct.
147
+
*Have one source sentence mapped to one target sentence. Although our training process handles source and target lines containing multiple sentences, one-to-one mapping is a best practice.
148
148
149
149
## How do I evaluate the results?
150
150
151
-
After your model is successfully trained, you can view the model's BLEU score and baseline model BLEU score on the model details page. We use the same set of test data to generate both the model's BLEU score and the baseline BLEU score. This data will help you make an informed decision regarding which model would be better for your use-case.
151
+
After your model is successfully trained, you can view the model's BLEU score and baseline model BLEU score on the model details page. We use the same set of test data to generate both the model's BLEU score and the baseline BLEU score. This data helps you make an informed decision regarding which model would be better for your use-case.
0 commit comments