You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/translator/custom-translator/azure-ai-foundry/beginners-guide.md
+10-10Lines changed: 10 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: Azure AI Translator Custom translation for beginners
2
+
title: Azure AI Translator custom translation for beginners
3
3
titleSuffix: Azure AI services
4
-
description: User guide for understanding the end-to-end customized machine translation process.
4
+
description: User guide for understanding the end-to-end customized machine translation process using Azure AI Foundry.
5
5
author: laujan
6
6
manager: nitinme
7
7
ms.service: azure-ai-translator
@@ -10,7 +10,7 @@ ms.date: 05/19/2025
10
10
ms.topic: overview
11
11
---
12
12
13
-
# Azure AI Translator Custom translation for beginners
13
+
# Azure AI Translator custom translation for beginners
14
14
15
15
[Custom translation](overview.md) enables you to a build translation system that reflects your business, industry, and domain-specific terminology and style. Training and deploying a custom system is easy and doesn't require any programming skills. The customized translation system seamlessly integrates into your existing applications, workflows, and websites and is available on Azure through the same cloud-based [Microsoft Text Translation API](../../text-translation/reference/v4/translate-api.md) service that powers billions of translations every day.
16
16
@@ -76,15 +76,15 @@ Finding in-domain quality data is often a challenging task that varies based on
76
76
77
77
## What is a BLEU score?
78
78
79
-
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the precision or accuracy of text that is machine translated from one language to another. Custom translation uses the BLEU metric as one way of conveying translation accuracy.
79
+
BLEU (Bilingual Evaluation Understudy) is an algorithm for evaluating the precision or accuracy of text that is machine translated from one language to another. custom translation uses the BLEU metric as one way of conveying translation accuracy.
80
80
81
81
A BLEU score is a number between zero and 100. A score of zero indicates a low quality translation where nothing in the translation matched the reference. A score of 100 indicates a perfect translation that is identical to the reference. It's not necessary to attain a score of 100 - a BLEU score between 40 and 60 indicates a high-quality translation.
## What happens if I don't submit tuning or testing data?
86
86
87
-
Tuning and test sentences are optimally representative of what you plan to translate in the future. If you don't submit any tuning or testing data, Custom translation automatically excludes sentences from your training documents to use as tuning and test data.
87
+
Tuning and test sentences are optimally representative of what you plan to translate in the future. If you don't submit any tuning or testing data, custom translation automatically excludes sentences from your training documents to use as tuning and test data.
88
88
89
89
| System-generated | Manual-selection |
90
90
|---|---|
@@ -93,13 +93,13 @@ Tuning and test sentences are optimally representative of what you plan to trans
93
93
| Easy to redo when you grow or shrink the domain. | Allows for more data and better domain coverage.|
94
94
|Changes each training run.| Remains static over repeated training runs|
95
95
96
-
## How is training material processed by Custom translation?
96
+
## How is training material processed by custom translation?
97
97
98
-
To prepare for training, documents undergo a series of processing and filtering steps. Knowledge of the filtering process can help with understanding the sentence count displayed as well as the steps you can take to prepare training documents for training with Custom translation. The filtering steps are as follows:
98
+
To prepare for training, documents undergo a series of processing and filtering steps. Knowledge of the filtering process can help with understanding the sentence count displayed as well as the steps you can take to prepare training documents for training with custom translation. The filtering steps are as follows:
99
99
100
100
*### Sentence alignment
101
101
102
-
If your document isn't in `XLIFF`, `XLSX`, `TMX`, or `ALIGN` format, Custom translation aligns the sentences of your source and target documents to each other, sentence-by-sentence. Translator doesn't perform document alignment—it follows your naming convention for the documents to find a matching document in the other language. Within the source text, Custom translation tries to find the corresponding sentence in the target language. It uses document markup like embedded HTML tags to help with the alignment.
102
+
If your document isn't in `XLIFF`, `XLSX`, `TMX`, or `ALIGN` format, custom translation aligns the sentences of your source and target documents to each other, sentence-by-sentence. Translator doesn't perform document alignment—it follows your naming convention for the documents to find a matching document in the other language. Within the source text, custom translation tries to find the corresponding sentence in the target language. It uses document markup like embedded HTML tags to help with the alignment.
103
103
104
104
If you see a large discrepancy between the number of sentences in the source and target documents, your source document can't be parallel, or couldn't be aligned. The document pairs with a large difference (>10%) of sentences on each side warrant a second look to make sure they're indeed parallel.
105
105
@@ -139,11 +139,11 @@ To prepare for training, documents undergo a series of processing and filtering
139
139
140
140
*### Invalid characters
141
141
142
-
Custom translation removes sentences that contain Unicode character U+FFFD. The character U+FFFD indicates a failed encoding conversion.
142
+
custom translation removes sentences that contain Unicode character U+FFFD. The character U+FFFD indicates a failed encoding conversion.
143
143
144
144
*### Invalid HTML tags
145
145
146
-
Custom translation removes valid tags during training. Invalid tags cause unpredictable results and should be manually removed.
146
+
custom translation removes valid tags during training. Invalid tags cause unpredictable results and should be manually removed.
147
147
148
148
## What steps should I take before uploading data?
Copy file name to clipboardExpand all lines: articles/ai-services/translator/custom-translator/azure-ai-foundry/concepts/bleu-score.md
+6-18Lines changed: 6 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: "BLEU score - Custom translation"
2
+
title: Azure AI Foundry custom translation BLEU score
3
3
titleSuffix: Azure AI services
4
4
description: The BLEU score measures the differences between machine translation and human-created reference translations of the same source sentence.
5
5
author: laujan
@@ -9,34 +9,22 @@ ms.topic: conceptual
9
9
ms.date: 05/19/2025
10
10
ms.author: lajanuar
11
11
ms.custom: cogserv-non-critical-translator
12
-
#Customer intent: As an Custom translation user, I want to understand how BLEU score works so that I understand system test outcome better.
12
+
#Customer intent: As an custom translation user, I want to understand how BLEU score works so that I understand system test outcome better.
13
13
---
14
14
15
-
# BLEU score
15
+
# Azure AI Foundry custom translation BLEU score
16
16
17
17
[BLEU (Bilingual Evaluation Understudy)](https://en.wikipedia.org/wiki/BLEU) is a measurement of the difference between an automatic translation and human-created reference translations of the same source sentence.
18
18
19
19
## Scoring process
20
20
21
-
The BLEU algorithm compares consecutive phrases of the automatic translation
22
-
with the consecutive phrases it finds in the reference translation, and counts
23
-
the number of matches, in a weighted fashion. These matches are position
24
-
independent. A higher match degree indicates a higher degree of similarity with
25
-
the reference translation, and higher score. Intelligibility and grammatical correctness aren't taken into account.
21
+
The BLEU algorithm compares consecutive phrases of the automatic translation with the consecutive phrases it finds in the reference translation, and counts the number of matches, in a weighted fashion. These matches are position independent. A higher match degree indicates a higher degree of similarity with the reference translation, and higher score. Intelligibility and grammatical correctness aren't taken into account.
26
22
27
23
## How BLEU works?
28
24
29
-
The BLEU score's strength is that it correlates well with human judgment. BLEU averages out
30
-
individual sentence judgment errors over a test corpus, rather than attempting
31
-
to devise the exact human judgment for every sentence.
25
+
The BLEU score's strength is that it correlates well with human judgment. BLEU averages out individual sentence judgment errors over a test corpus, rather than attempting to devise the exact human judgment for every sentence.
32
26
33
-
A more extensive discussion of BLEU scores is [here](https://youtu.be/-UqDljMymMg).
34
-
35
-
BLEU results depend strongly on the breadth of your domain; consistency of
36
-
test, training and tuning data; and how much data you have
37
-
available for training. If your models are trained within a narrow domain, and
38
-
your training data is consistent with your test data, you can expect a high
39
-
BLEU score.
27
+
A more extensive discussion of BLEU scores *see*[Microsoft Translator Hub - Discussion of BLEU Score](https://youtu.be/-UqDljMymMg). BLEU results depend strongly on the breadth of your domain; consistency of test, training and tuning data; and how much data you have available for training. If your models are trained within a narrow domain, and your training data is consistent with your test data, you can expect a high BLEU score.
40
28
41
29
>[!NOTE]
42
30
>A comparison between BLEU scores is only justifiable when BLEU results are compared with the same Test set, the same language pair, and the same MT engine. A BLEU score from a different test set is bound to be different.
description: Build your own machine translation system using your preferred terminology and style.
5
5
#services: cognitive-services
@@ -11,9 +11,9 @@ ms.date: 05/19/2025
11
11
ms.author: lajanuar
12
12
---
13
13
14
-
# Customize your text translations
14
+
# Azure AI Foundry custom translations
15
15
16
-
Custom translation is a feature of the Azure AI Translator service. Custom translation allows users to customize Azure AI Translator's advanced neural machine translation when translating text using Translator (version 3 only).
16
+
Custom translation is a feature of the Azure AI Translator service. Azure AI Foundry custom translation allows users to customize Azure AI Translator's advanced neural machine translation when translating text using Translator (version 3 only).
17
17
18
18
The feature can also be used to customize speech translation when used with [Azure AI Speech](../../../../speech-service/index.yml).
19
19
@@ -36,4 +36,4 @@ More details about the various levels of customization based on available data c
36
36
## Next steps
37
37
38
38
> [!div class="nextstepaction"]
39
-
> [Set up a customized language system using Custom translation](../overview.md)
39
+
> [Set up a customized language system using custom translation](../overview.md)
Copy file name to clipboardExpand all lines: articles/ai-services/translator/custom-translator/azure-ai-foundry/concepts/data-filtering.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,30 @@
1
1
---
2
-
title: "Data Filtering - Custom translation"
2
+
title: Azure AI Foundry custom translation data filtering
3
3
titleSuffix: Azure AI services
4
-
description: Explaining how training documents for a custom system undergo a series of processing and filtering steps.
4
+
description: How custom transltation training documents for a custom system undergo a series of processing and filtering steps.
5
5
author: laujan
6
6
manager: nitinme
7
7
ms.service: azure-ai-translator
8
8
ms.date: 05/19/2025
9
9
ms.author: lajanuar
10
10
ms.topic: conceptual
11
11
ms.custom: cogserv-non-critical-translator
12
-
#Customer intent: As a Custom translation, I want to understand how data is filtered before training a model.
12
+
#Customer intent: As a custom translation, I want to understand how data is filtered before training a model.
13
13
---
14
14
15
-
# Custom translation Data filtering
15
+
# Azure AI Foundry custom translation data filtering
16
16
17
-
When you submit documents to be used for training, the documents undergo a series of processing and filtering steps. These steps are explained here. The knowledge of the filtering can help you understand the sentence count displayed in Custom translation and the steps you can take yourself to prepare the documents for training with Custom translation.
17
+
When you submit documents to be used for training, the documents undergo a series of processing and filtering steps. These steps are explained here. The knowledge of the filtering can help you understand the sentence count displayed in custom translation and the steps you can take yourself to prepare the documents for training with custom translation.
18
18
19
19
## Sentence alignment
20
20
21
-
If your document isn't in XLIFF, `TMX`, or ALIGN format, Custom translation aligns the sentences of your source and target documents to each other, sentence by sentence. Custom translation doesn't perform document alignment – it follows your naming of the documents to find the matching document of the other language. Within the document, Custom translation tries to find the corresponding sentence in the other language. It uses document markup like embedded HTML tags to help with the alignment.
21
+
If your document isn't in XLIFF, `TMX`, or ALIGN format, custom translation aligns the sentences of your source and target documents to each other, sentence by sentence. custom translation doesn't perform document alignment – it follows your naming of the documents to find the matching document of the other language. Within the document, custom translation tries to find the corresponding sentence in the other language. It uses document markup like embedded HTML tags to help with the alignment.
22
22
23
-
If you see a large discrepancy between the number of sentences in the source and target documents, your documents can't be parallel. The document pairs with a large difference (>10%) of sentences on each side warrant a second look to make sure they're indeed parallel. Custom translation shows a warning next to the document if the sentence count differs suspiciously.
23
+
If you see a large discrepancy between the number of sentences in the source and target documents, your documents can't be parallel. The document pairs with a large difference (>10%) of sentences on each side warrant a second look to make sure they're indeed parallel. custom translation shows a warning next to the document if the sentence count differs suspiciously.
24
24
25
25
## Deduplication
26
26
27
-
Custom translation removes the sentences that are present in test and tuning documents from training data. The removal happens dynamically inside of the training run, not in the data processing step. Custom translation reports the sentence count to you in the project overview before such removal. Deduplication doesn't apply if you choose to upload your own test and tuning documents.
27
+
Custom translation removes the sentences that are present in test and tuning documents from training data. The removal happens dynamically inside of the training run, not in the data processing step. custom translation reports the sentence count to you in the project overview before such removal. Deduplication doesn't apply if you choose to upload your own test and tuning documents.
Copy file name to clipboardExpand all lines: articles/ai-services/translator/custom-translator/azure-ai-foundry/concepts/dictionaries.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: "Dictionary - Custom translation"
2
+
title: Azure AI Foundry custom translation dictionary
3
3
titleSuffix: Azure AI services
4
-
description: How to create an aligned document specifying a list of phrases or sentences (and their translations) that you always want Azure AI Translator to translate in the same manner. Dictionaries can also be called glossaries or term bases.
4
+
description: How to create an Azure AI Foundry custom translation dictionary specifying a list of phrases or sentences (and their translations) that you want Azure AI Translator to always translate in the same manner. Dictionaries can also be called glossaries or term bases.
#Customer intent: As a Custom Translator, I want to understand how to use a dictionary to build a custom translation model.
13
13
---
14
14
15
-
# Dictionary
15
+
# Azure AI Foundry custom translation dictionary
16
16
17
-
A custom translation dictionary is an aligned pair of documents that specifies a list of phrases or sentences and their corresponding translations. Use a dictionary in your training, when you want Custom Translator to translate any instances of the source phrase or sentence, using the translation you provide in the dictionary. Dictionaries are sometimes called glossaries or term bases. You can think of the dictionary as a brute force "copy and replace" for all the terms you list. Furthermore, Custom translation service builds and makes use of its own general purpose dictionaries to improve the quality of its translation. However, a customer provided dictionary takes precedent and is searched first to look up words or sentences.
17
+
A custom translation dictionary is an aligned pair of documents that specifies a list of phrases or sentences and their corresponding translations. Use a dictionary in your training, when you want Custom Translator to translate any instances of the source phrase or sentence, using the translation you provide in the dictionary. Dictionaries are sometimes called glossaries or term bases. You can think of the dictionary as a brute force "copy and replace" for all the terms you list. Furthermore, custom translation service builds and makes use of its own general purpose dictionaries to improve the quality of its translation. However, a customer provided dictionary takes precedent and is searched first to look up words or sentences.
18
18
19
19
Dictionaries only work for projects in language pairs that have a fully supported Microsoft general neural network model behind them. [View the complete list of languages](../../../language-support.md).
0 commit comments