You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/speech-synthesis-markup.md
+40-19Lines changed: 40 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -354,7 +354,10 @@ Phonetic alphabets are composed of phones, which are made up of letters, numbers
354
354
355
355
## Use custom lexicon to improve pronunciation
356
356
357
-
Sometimes TTS cannot accurately pronounce a word, for example, a company or foreign name. Developers can define the reading of these entities in SSML using `phoneme` and `sub` tag, or define the reading of multiple entities by referring to a custom lexicon file using `lexicon` tag.
357
+
Sometimes the text-to-speech service cannot accurately pronounce a word. For example, the name of a company, or a medical term. Developers can define how single entities are read in SSML using the `phoneme` and `sub` tags. However, if you need to define how multiple entities are read, you can create a custom lexicon using the `lexicon` tag.
358
+
359
+
> [!NOTE]
360
+
> Custom lexicon currently supports UTF-8 encoding.
358
361
359
362
**Syntax**
360
363
@@ -370,14 +373,10 @@ Sometimes TTS cannot accurately pronounce a word, for example, a company or fore
370
373
371
374
**Usage**
372
375
373
-
Step 1: Define custom lexicon
374
-
375
-
You can define the reading of entities by a list of custom lexicon items, stored as an .xml or .pls file.
376
-
377
-
**Example**
376
+
To define how multiple entities are read, you can create a custom lexicon, which is stored as an .xml or .pls file. The following is a sample .xml file.
@@ -395,39 +394,61 @@ You can define the reading of entities by a list of custom lexicon items, stored
395
394
</lexicon>
396
395
```
397
396
398
-
Each `lexeme` element is a lexicon item. `grapheme` contains text describing the orthograph of `lexeme`. Readout form can be provided as `alias`. Phone string could be provided in `phoneme` element.
397
+
The `lexicon` element contains at least one `lexeme` element. Each `lexeme` element contains at least one `grapheme` element and one or more `grapheme`, `alias`, and `phoneme` elements. The `grapheme` element contains text describing the <ahref="https://www.w3.org/TR/pronunciation-lexicon/#term-Orthography"target="_blank">orthography <spanclass="docon docon-navigate-external x-hidden-focus"></span></a>. The `alias` elements are used to indicate the pronunciation of an acronym or an abbreviated term. The `phoneme` element provides text describing how the `lexeme` is pronounced.
398
+
399
+
It's important to note, that you cannot directly set the pronunciation of a word using the custom lexicon. If you need to set the pronunciation for an, first provide an `alias`, then associate the `phoneme` with that `alias`. For example:
400
+
401
+
```xml
402
+
<lexeme>
403
+
<grapheme>Scotland MV</grapheme>
404
+
<alias>ScotlandMV</alias>
405
+
</lexeme>
406
+
<lexeme>
407
+
<grapheme>ScotlandMV</grapheme>
408
+
<phoneme>ˈskɒtlənd.ˈmiːdiəm.weɪv</phoneme>
409
+
</lexeme>
410
+
```
411
+
412
+
> [!IMPORTANT]
413
+
> The `phoneme` element cannot contain white spaces when using IPA.
399
414
400
-
The `lexicon` element contains at least one `lexeme` element. Each `lexeme` element contains at least one `grapheme` element and one or more `grapheme`, `alais`, and `phoneme` elements. The `grapheme` element contains text describing the <ahref="https://www.w3.org/TR/pronunciation-lexicon/#term-Orthography"target="_blank">orthography <spanclass="docon docon-navigate-external x-hidden-focus"></span></a>. The `alias` elements are used to indicate the pronunciation of an acronym or an abbreviated term. The `phoneme` element provides text describing how the `lexeme` is pronounced.
415
+
For more information about custom lexicon file, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/).
401
416
402
-
For more information about custom lexicon file, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/) on the W3C website.
417
+
Next, publish your custom lexicon file. While we don't have restrictions on where this file can be stored, we do recommend using [Azure Blob Storage](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
403
418
404
-
Step 2: Upload custom lexicon file created in step 1 online, you could store it anywhere, and we suggest you to store it in Microsoft Azure, for example [Azure Blob Storage](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
419
+
After you've published your custom lexicon, you can reference it from your SSML.
405
420
406
-
Step 3: Refer to custom lexicon file in SSML
421
+
> [!NOTE]
422
+
> The `lexicon` element must be inside the `voice` element.
BTW, we will be there probably at 8:00 tomorrow morning.
431
+
Could you help leave a message to Robert Benigni for me?
432
+
</voice>
415
433
</speak>
416
434
```
417
-
"BTW" will be read as "By the way". "Benigni" will be read with provided IPA "bɛˈniːnji".
418
435
419
-
**Limitation**
436
+
When using this custom lexicon, "BTW" will be read as "By the way". "Benigni" will be read with the provided IPA "bɛˈniːnji".
437
+
438
+
**Limitations**
420
439
- File size: custom lexicon file size maximum limit is 100KB, if beyond this size, synthesis request will fail.
421
440
- Lexicon cache refresh: custom lexicon will be cached with URI as key on TTS Service when it's first loaded. Lexicon with same URI won't be reloaded within 15 mins, so custom lexicon change needs to wait at most 15 mins to take effect.
422
441
423
442
**Speech service phonetic sets**
424
443
425
-
In the sample above, we're using the International Phonetic Alphabet, also known as the IPA phone set. We suggest developers use the IPA, because it is the international standard. Considering that the IPA is not easy to remember, the Speech service defines a phonetic set for seven languages (`en-US`, `fr-FR`, `de-DE`, `es-ES`, `ja-JP`, `zh-CN`, and `zh-TW`).
444
+
In the sample above, we're using the International Phonetic Alphabet, also known as the IPA phone set. We suggest developers use the IPA, because it is the international standard. For some IPA characters, they have the 'precomposed' and 'decomposed' version when being represented with Unicode. Custom lexicon only support the decomposed unicodes.
445
+
446
+
Considering that the IPA is not easy to remember, the Speech service defines a phonetic set for seven languages (`en-US`, `fr-FR`, `de-DE`, `es-ES`, `ja-JP`, `zh-CN`, and `zh-TW`).
426
447
427
448
You can use the `sapi` as the vale for the `alphabet` attribute with custom lexicons as demonstrated below:
0 commit comments