Skip to content

Commit 8f2accc

Browse files
authored
Merge pull request #185630 from MicrosoftDocs/release-preview-custom-translator-portal
Release preview custom translator portal
2 parents 4193fc0 + 8c4b9fb commit 8f2accc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+980
-53
lines changed

articles/cognitive-services/Translator/custom-translator/how-to-create-project.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.topic: conceptual
1414

1515
# Create a project
1616

17-
A project is a container for models, documents, and tests. Each project automatically includes all documents that are uploaded into that workspace that have the correct language pair.
17+
A project contains translation models for one language pair. Each includes all documents that are uploaded into that workspace that have the correct language pair.
1818

1919
Creating project is the first step toward building a model.
2020

articles/cognitive-services/Translator/custom-translator/overview.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ ms.topic: overview
1313
---
1414
# What is Custom Translator?
1515

16-
[Custom Translator](https://portal.customtranslator.azure.ai) is a feature of the Microsoft Translator service, which enables Translator enterprises, app developers, and language service providers to build customized neural machine translation (NMT) systems. The customized translation systems seamlessly integrate into existing applications, workflows, and websites.
16+
Custom Translator is a feature of the Microsoft Translator service, which enables Translator enterprises, app developers, and language service providers to build customized neural machine translation (NMT) systems. The customized translation systems seamlessly integrate into existing applications, workflows, and websites.
1717

1818
Translation systems built with [Custom Translator](https://portal.customtranslator.azure.ai) are available through the same cloud-based, secure, high performance, highly scalable Microsoft Translator [Text API V3](../reference/v3-0-translate.md?tabs=curl), that powers billions of translations every day.
1919

@@ -56,7 +56,7 @@ Using the secure [Custom Translator](https://portal.customtranslator.azure.ai) p
5656

5757
[Custom Translator](https://portal.customtranslator.azure.ai) can also be programmatically accessed through a [dedicated API](https://custom-api.cognitive.microsofttranslator.com/swagger/) (currently in preview). The API allows users to manage creating or updating training through their own app or webservice.
5858

59-
The cost of using a custom model to translate content is based on the users Translator Text API pricing tier. See the Cognitive Services [Translator Text API pricing webpage](https://azure.microsoft.com/pricing/details/cognitive-services/translator-text-api/)
59+
The cost of using a custom model to translate content is based on the user's Translator Text API pricing tier. See the Cognitive Services [Translator Text API pricing webpage](https://azure.microsoft.com/pricing/details/cognitive-services/translator-text-api/)
6060
for pricing tier details.
6161

6262
## Securely translate anytime, anywhere on all your apps and services

articles/cognitive-services/Translator/custom-translator/quickstart-build-deploy-custom-model.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
2-
title: "Quickstart: Build, deploy, and use a custom model - Custom Translator"
2+
title: "Quickstart: Build, deploy, and use a custom model"
33
titleSuffix: Azure Cognitive Services
4-
description: In this quickstart, you go through step-by-step process of building a translation system using the Custom Translator.
4+
description: A step-by-step guide to building a translation system using the Custom Translator Legacy.
55
author: laujan
66
manager: nitinme
77
ms.service: cognitive-services
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
---
2+
title: Custom Translator for beginners
3+
titleSuffix: Azure Cognitive Services
4+
description: A user guide for understanding the end-to-end customized machine translation process.
5+
author: laujan
6+
manager: nitinme
7+
ms.service: cognitive-services
8+
ms.subservice: translator-text
9+
ms.date: 01/20/2022
10+
ms.author: moelghaz
11+
ms.topic: overview
12+
---
13+
# Custom Translator for beginners | Preview
14+
15+
[Custom Translator](../overview.md) enables you to a build translation system that reflects your business, industry, and domain-specific terminology and style. Training and deploying a custom system is easy and does not require any programming skills. The customized translation system seamlessly integrates into your existing applications, workflows, and websites and is available on Azure through the same cloud-based [Microsoft Text Translator API](../../reference/v3-0-translate.md?tabs=curl) service that powers billions of translations every day.
16+
17+
## Is a custom translation model the right choice for me?
18+
19+
A well-trained custom translation model provides more accurate domain-specific translations. This is because it relies on previously translated in-domain documents to learn preferred translations. Translator uses these terms and phrases in context to produce fluent translations in the target language while respecting context-dependent grammar.
20+
21+
Training a full custom translation model requires a substantial amount of data. If you do not have at least 10,000 sentences of previously trained documents, you will not be able to train a full-language translation model. However, you can either train a dictionary-only model or use the high-quality, out-of-the-box translations available with the Text Translator API.
22+
23+
:::image type="content" source="media/how-to/for-beginners.png" alt-text="Screenshot illustrating the difference between custom and general models.":::
24+
25+
## What does training a custom translation model involve?
26+
27+
Building a custom translation model requires:
28+
29+
* Understanding your use-case.
30+
31+
* Obtaining in-domain translated data (preferably human translated).
32+
33+
* The ability to assess translation quality or target language translations.
34+
35+
## How do I evaluate my use-case?
36+
37+
Having clarity on your use-case and what success looks like is the first step towards sourcing proficient training data. Here are a few considerations:
38+
39+
* What is your desired outcome and how will you measure it?
40+
41+
* What is your business domain?
42+
43+
* Do you have in-domain sentences of similar terminology and style?
44+
45+
* Does your use-case involve multiple domains? If yes, should you build one translation system or multiple systems?
46+
47+
* Do you have requirements impacting regional data residency at-rest and in-transit?
48+
49+
* Are the target users in one or multiple regions?
50+
51+
## How should I source my data?
52+
53+
Finding in-domain quality data is often a challenging task that varies based on user classification. Here are some questions you can ask yourself as you evaluate what data may be available to you:
54+
55+
* Enterprises often have a wealth of translation data that has accumulated over many years of using human translation. Does your company have previous translation data available that you can use?
56+
57+
* Do you have a vast amount of monolingual data? Monolingual data is data in only one language. If so, can you get translations for this data?
58+
59+
* Can you crawl online portals to collect source sentences and synthesize target sentences?
60+
61+
## What should I use for training material?
62+
63+
| Source | What it does | Rules to follow |
64+
|---|---|---|
65+
| Bilingual training documents | Teaches the system your terminology and style. | **Be liberal**. Any in-domain human translation is better than machine translation. Add and remove documents as you go and try to improve the [BLEU score](/azure/cognitive-services/translator/custom-translator/what-is-bleu-score?WT.mc_id=aiml-43548-heboelma). |
66+
| Tuning documents | Trains the Neural Machine Translation parameters. | **Be strict**. Compose them to be optimally representative of what you are going to translation in the future. |
67+
| Test documents | Calculate the [BLEU score](/azure/cognitive-services/translator/custom-translator/what-is-bleu-score?WT.mc_id=aiml-43548-heboelma).| **Be strict**. Compose test documents to be optimally representative of what you plan to translate in the future. |
68+
| Phrase dictionary | Forces the given translation 100% of the time. | **Be restrictive**. A phrase dictionary is case-sensitive and any word or phrase listed is translated in the way you specify. In many cases, it is better to not use a phrase dictionary and let the system learn. |
69+
| Sentence dictionary | Forces the given translation 100% of the time. | **Be strict**. A sentence dictionary is case-insensitive and good for common in domain short sentences. For a sentence dictionary match to occur, the entire submitted sentence must match the source dictionary entry. If only a portion of the sentence matches, the entry won't match. |
70+
71+
## What is a BLEU score?
72+
73+
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the precision or accuracy of text that has been machine translated from one language to another. Custom Translator uses the BLEU metric as one way of conveying translation accuracy.
74+
75+
A BLEU score is a number between zero and 100. A score of zero indicates a low quality translation where nothing in the translation matched the reference. A score of 100 indicates a perfect translation that is identical to the reference. It's not necessary to attain a score of 100 - a BLEU score between 40 and 60 indicates a high-quality translation.
76+
77+
[Read more](/azure/cognitive-services/translator/custom-translator/what-is-bleu-score?WT.mc_id=aiml-43548-heboelma)
78+
79+
## What happens if I don't submit tuning or testing data?
80+
81+
Tuning and test sentences are optimally representative of what you plan to translate in the future. If you don't submit any tuning or testing data, Custom Translator will automatically exclude sentences from your training documents to use as tuning and test data.
82+
83+
| System-generated | Manual-selection |
84+
|---|---|
85+
| Convenient. | Enables fine-tuning for your future needs.|
86+
| Good, if you know that your training data is representative of what you are planning to translate. | Provides more freedom to compose your training data.|
87+
| Easy to redo when you grow or shrink the domain. | Allows for more data and better domain coverage.|
88+
|Changes each training run.| Remains static over repeated training runs|
89+
90+
## How is training material processed by Custom Translator?
91+
92+
When you submit documents for training a custom translation system, the documents undergo a series of processing and filtering steps to prepare for training. These steps are explained below. Knowledge of the filtering process may help with understanding the sentence count displayed as well as the steps you can take to prepare training documents for training with Custom Translator.
93+
94+
* ### Sentence alignment
95+
96+
If your document isn't in XLIFF, XLSX, TMX, or ALIGN format, Custom Translator aligns the sentences of your source and target documents to each other, sentence-by-sentence. Translator doesn't perform document alignment—it follows your naming convention for the documents to find a matching document in the other language. Within the source text, Custom Translator tries to find the corresponding sentence in the target language. It uses document markup like embedded HTML tags to help with the alignment.
97+
98+
If you see a large discrepancy between the number of sentences in the source and target documents, your source document may not be parallel or couldn't be aligned. The document pairs with a large difference (>10%) of sentences on each side warrant a second look to make sure they're indeed parallel.
99+
100+
* ### Extracting tuning and testing data
101+
102+
Tuning and testing data is optional. If you don't provide it, the system will remove an appropriate percentage from your training documents to use for tuning and testing. The removal happens dynamically as part of the training process. Since this step occurs as part of training, your uploaded documents are not affected. You can see the final used sentence counts for each category of data—training, tuning, testing, and dictionary—on the Model details page after training has succeeded.
103+
104+
* ### Length filter
105+
106+
* Removes sentences with only one word on either side.
107+
* Removes sentences with more than 100 words on either side. Chinese, Japanese, Korean are exempt.
108+
* Removes sentences with fewer than three characters. Chinese, Japanese, Korean are exempt.
109+
* Removes sentences with more than 2000 characters for Chinese, Japanese, Korean.
110+
* Removes sentences with less than 1% alphanumeric characters.
111+
* Removes dictionary entries containing more than 50 words.
112+
113+
* ### White space
114+
115+
* Replaces any sequence of white-space characters including tabs and CR/LF sequences with a single space character.
116+
* Removes leading or trailing space in the sentence.
117+
118+
* ### Sentence end punctuation
119+
120+
* Replaces multiple sentence-end punctuation characters with a single instance. Japanese character normalization.
121+
122+
* Converts full width letters and digits to half-width characters.
123+
124+
* ### Unescaped XML tags
125+
126+
Transforms unescaped tags into escaped tags:
127+
128+
| Tag | Becomes |
129+
|---|---|
130+
| \< | \< |
131+
| \> | \> |
132+
| \& | \& |
133+
134+
* ### Invalid characters
135+
136+
Custom Translator removes sentences that contain Unicode character U+FFFD. The character U+FFFD indicates a failed encoding conversion.
137+
138+
## What steps should I take before uploading data?
139+
140+
* Remove sentences with invalid encoding.
141+
* Remove Unicode control characters.
142+
* If feasible, align sentences (source-to-target).
143+
* Remove source and target sentences that do not match the source and target languages.
144+
* When source and target sentences have mixed languages, ensure that untranslated words are intentional, for example, names of organizations and products.
145+
* Correct grammatical and typographical errors to prevent teaching these errors to your model.
146+
* Though our training process handles source and target lines containing multiple sentences, it's better to have one source sentence mapped to one target sentence.
147+
148+
## How do I evaluate the results?
149+
150+
After your model is successfully trained, you can view the model's BLEU score and baseline model BLEU score on the model details page. We use the same set of test data to generate both the model's BLEU score and the baseline BLEU score to help you make an informed decision regarding which model would be better for your use-case.
151+
152+
## Next steps
153+
154+
> [!div class="nextstepaction"]
155+
> [Try our Quickstart](quickstart.md)
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
title: Create and manage a project
3+
titleSuffix: Azure Cognitive Services
4+
description: How to create and manage a project in the Azure Cognitive Services Custom Translator Preview.
5+
author: laujan
6+
manager: nitinme
7+
ms.service: cognitive-services
8+
ms.subservice: translator-text
9+
ms.date: 01/20/2022
10+
ms.author: moelghaz
11+
ms.topic: conceptual
12+
---
13+
14+
# Create and manage a project | Preview
15+
16+
> [!IMPORTANT]
17+
> Custom Translator v2.0 is currently in public preview. Some features may not be supported or have constrained capabilities.
18+
19+
A project contains translation models for one language pair. Each project includes all documents that were uploaded into that workspace with the correct language pair.
20+
21+
Creating a project is the first step in building and publishing a model.
22+
23+
## Create a project
24+
25+
1. After you sign-in, your default workspace is loaded. To create a project in different workspace, select **My workspaces**, then select a workspace name.
26+
27+
1. Select **Create project**.
28+
29+
1. Enter the following details about your project in the creation dialog:
30+
31+
- **Project name (required):** Give your project a unique, meaningful name. It's not necessary to mention the languages within the title.
32+
33+
- **Language pair (required):** Select the source and target languages from the dropdown list
34+
35+
- **Domain (required):** Select the domain from the dropdown list that's most appropriate for your project. The domain describes the terminology and style of the documents you intend to translate.
36+
37+
>[!Note]
38+
>Select **Show advanced options** to add project label, project description, and domain description
39+
40+
- **Project label:** The project label distinguishes between projects with the same language pair and domain. As a best practice, here are a few tips:
41+
42+
- Use a label *only* if you're planning to build multiple projects for the same language pair and same domain and want to access these projects with a different Domain ID.
43+
44+
- Don't use a label if you're building systems for one domain only.
45+
46+
- A project label is not required and not helpful to distinguish between language pairs.
47+
48+
- You can use the same label for multiple projects.
49+
50+
- **Project description:** A short summary about the project. This description has no influence over the behavior of the Custom Translator or your resulting custom system, but can help you differentiate between different projects.
51+
52+
- **Domain description:** Use this field to better describe the particular field or industry in which you're working. or example, if your category is medicine, you might add details about your subfield, such as surgery or pediatrics. The description has no influence over the behavior of the Custom Translator or your resulting custom system.
53+
54+
1. Select **Create project**.
55+
56+
:::image type="content" source="../media/how-to/create-project-dialog.png" alt-text="Screenshot illustrating the create project fields.":::
57+
58+
## Edit a project
59+
60+
To modify the project name, project description, or domain description:
61+
62+
1. Select the workspace name.
63+
64+
1. Select the project name, for example, *English-to-German*.
65+
66+
1. The **Edit and Delete** buttons should now be visible.
67+
68+
:::image type="content" source="../media/how-to/edit-project-dialog-1.png" alt-text="Screenshot illustrating the edit project fields":::
69+
70+
1. Select **Edit** and fill in or modify existing text.
71+
72+
:::image type="content" source="../media/how-to/edit-project-dialog-2.png" alt-text="Screenshot illustrating detailed edit project fields.":::
73+
74+
1. Select **Edit project** to save.
75+
76+
## Delete a project
77+
78+
1. Follow the [**Edit a project**](#edit-a-project) steps 1-3 above.
79+
80+
1. Select **Delete** and read the delete message before you select **Delete project** to confirm.
81+
82+
:::image type="content" source="../media/how-to/delete-project-1.png" alt-text="Screenshot illustrating delete project fields.":::
83+
84+
>[!Note]
85+
>If your project has a published model or a model that is currently in training, you will only be able to delete your project once your model is no longer published or training.
86+
>
87+
> :::image type="content" source="../media/how-to/delete-project-2.png" alt-text="Screenshot illustrating the unable to delete message.":::
88+
89+
## Next steps
90+
91+
- Learn [how to manage project documents](create-manage-training-documents.md).
92+
- Learn [how to train a model](train-custom-model.md).
93+
- Learn [how to test and evaluate model quality](view-model-test-translation.md).
94+
- Learn [how to publish model](publish-model.md).
95+
- Learn [how to translate with custom models](translate-with-custom-model.md).

0 commit comments

Comments
 (0)