Skip to content

Commit 367fc57

Browse files
authored
Merge pull request #114264 from zhaoyunED/jiajzhan/customLexiconUpdate
update custom lexicon feature
2 parents fc347e4 + 5579f45 commit 367fc57

File tree

1 file changed

+40
-19
lines changed

1 file changed

+40
-19
lines changed

articles/cognitive-services/Speech-Service/speech-synthesis-markup.md

Lines changed: 40 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -354,7 +354,10 @@ Phonetic alphabets are composed of phones, which are made up of letters, numbers
354354

355355
## Use custom lexicon to improve pronunciation
356356

357-
Sometimes TTS cannot accurately pronounce a word, for example, a company or foreign name. Developers can define the reading of these entities in SSML using `phoneme` and `sub` tag, or define the reading of multiple entities by referring to a custom lexicon file using `lexicon` tag.
357+
Sometimes the text-to-speech service cannot accurately pronounce a word. For example, the name of a company, or a medical term. Developers can define how single entities are read in SSML using the `phoneme` and `sub` tags. However, if you need to define how multiple entities are read, you can create a custom lexicon using the `lexicon` tag.
358+
359+
> [!NOTE]
360+
> Custom lexicon currently supports UTF-8 encoding.
358361
359362
**Syntax**
360363

@@ -370,14 +373,10 @@ Sometimes TTS cannot accurately pronounce a word, for example, a company or fore
370373

371374
**Usage**
372375

373-
Step 1: Define custom lexicon
374-
375-
You can define the reading of entities by a list of custom lexicon items, stored as an .xml or .pls file.
376-
377-
**Example**
376+
To define how multiple entities are read, you can create a custom lexicon, which is stored as an .xml or .pls file. The following is a sample .xml file.
378377

379378
```xml
380-
<?xml version="1.0" encoding="UTF-16"?>
379+
<?xml version="1.0" encoding="UTF-8"?>
381380
<lexicon version="1.0"
382381
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
383382
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
@@ -395,39 +394,61 @@ You can define the reading of entities by a list of custom lexicon items, stored
395394
</lexicon>
396395
```
397396

398-
Each `lexeme` element is a lexicon item. `grapheme` contains text describing the orthograph of `lexeme`. Readout form can be provided as `alias`. Phone string could be provided in `phoneme` element.
397+
The `lexicon` element contains at least one `lexeme` element. Each `lexeme` element contains at least one `grapheme` element and one or more `grapheme`, `alias`, and `phoneme` elements. The `grapheme` element contains text describing the <a href="https://www.w3.org/TR/pronunciation-lexicon/#term-Orthography" target="_blank">orthography <span class="docon docon-navigate-external x-hidden-focus"></span></a>. The `alias` elements are used to indicate the pronunciation of an acronym or an abbreviated term. The `phoneme` element provides text describing how the `lexeme` is pronounced.
398+
399+
It's important to note, that you cannot directly set the pronunciation of a word using the custom lexicon. If you need to set the pronunciation for an, first provide an `alias`, then associate the `phoneme` with that `alias`. For example:
400+
401+
```xml
402+
<lexeme>
403+
<grapheme>Scotland MV</grapheme>
404+
<alias>ScotlandMV</alias>
405+
</lexeme>
406+
<lexeme>
407+
<grapheme>ScotlandMV</grapheme>
408+
<phoneme>ˈskɒtlənd.ˈmiːdiəm.weɪv</phoneme>
409+
</lexeme>
410+
```
411+
412+
> [!IMPORTANT]
413+
> The `phoneme` element cannot contain white spaces when using IPA.
399414
400-
The `lexicon` element contains at least one `lexeme` element. Each `lexeme` element contains at least one `grapheme` element and one or more `grapheme`, `alais`, and `phoneme` elements. The `grapheme` element contains text describing the <a href="https://www.w3.org/TR/pronunciation-lexicon/#term-Orthography" target="_blank">orthography <span class="docon docon-navigate-external x-hidden-focus"></span></a>. The `alias` elements are used to indicate the pronunciation of an acronym or an abbreviated term. The `phoneme` element provides text describing how the `lexeme` is pronounced.
415+
For more information about custom lexicon file, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/).
401416

402-
For more information about custom lexicon file, see [Pronunciation Lexicon Specification (PLS) Version 1.0](https://www.w3.org/TR/pronunciation-lexicon/) on the W3C website.
417+
Next, publish your custom lexicon file. While we don't have restrictions on where this file can be stored, we do recommend using [Azure Blob Storage](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
403418

404-
Step 2: Upload custom lexicon file created in step 1 online, you could store it anywhere, and we suggest you to store it in Microsoft Azure, for example [Azure Blob Storage](https://docs.microsoft.com/azure/storage/blobs/storage-quickstart-blobs-portal).
419+
After you've published your custom lexicon, you can reference it from your SSML.
405420

406-
Step 3: Refer to custom lexicon file in SSML
421+
> [!NOTE]
422+
> The `lexicon` element must be inside the `voice` element.
407423
408424
```xml
409425
<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis"
410426
xmlns:mstts="http://www.w3.org/2001/mstts"
411427
xml:lang="en-US">
412-
<lexicon uri="http://www.example.com/customlexicon.xml"/>
413-
BTW, we will be there probably 8:00 tomorrow morning.
414-
Could you help leave a message to Robert Benigni for me?
428+
<voice name="en-US-AriaRUS">
429+
<lexicon uri="http://www.example.com/customlexicon.xml"/>
430+
BTW, we will be there probably at 8:00 tomorrow morning.
431+
Could you help leave a message to Robert Benigni for me?
432+
</voice>
415433
</speak>
416434
```
417-
"BTW" will be read as "By the way". "Benigni" will be read with provided IPA "bɛˈniːnji".
418435

419-
**Limitation**
436+
When using this custom lexicon, "BTW" will be read as "By the way". "Benigni" will be read with the provided IPA "bɛˈniːnji".
437+
438+
**Limitations**
420439
- File size: custom lexicon file size maximum limit is 100KB, if beyond this size, synthesis request will fail.
421440
- Lexicon cache refresh: custom lexicon will be cached with URI as key on TTS Service when it's first loaded. Lexicon with same URI won't be reloaded within 15 mins, so custom lexicon change needs to wait at most 15 mins to take effect.
422441

423442
**Speech service phonetic sets**
424443

425-
In the sample above, we're using the International Phonetic Alphabet, also known as the IPA phone set. We suggest developers use the IPA, because it is the international standard. Considering that the IPA is not easy to remember, the Speech service defines a phonetic set for seven languages (`en-US`, `fr-FR`, `de-DE`, `es-ES`, `ja-JP`, `zh-CN`, and `zh-TW`).
444+
In the sample above, we're using the International Phonetic Alphabet, also known as the IPA phone set. We suggest developers use the IPA, because it is the international standard. For some IPA characters, they have the 'precomposed' and 'decomposed' version when being represented with Unicode. Custom lexicon only support the decomposed unicodes.
445+
446+
Considering that the IPA is not easy to remember, the Speech service defines a phonetic set for seven languages (`en-US`, `fr-FR`, `de-DE`, `es-ES`, `ja-JP`, `zh-CN`, and `zh-TW`).
426447

427448
You can use the `sapi` as the vale for the `alphabet` attribute with custom lexicons as demonstrated below:
428449

429450
```xml
430-
<?xml version="1.0" encoding="UTF-16"?>
451+
<?xml version="1.0" encoding="UTF-8"?>
431452
<lexicon version="1.0"
432453
xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
433454
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

0 commit comments

Comments
 (0)