You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/cognitive-services/Speech-Service/how-to-speech-synthesis-viseme.md
+27-2Lines changed: 27 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,13 +43,38 @@ The overall workflow of viseme is depicted in the following flowchart:
43
43
44
44
## Viseme ID
45
45
46
-
Viseme ID refers to an integer number that specifies a viseme. We offer 22 different visemes, each depicting the mouth shape for a specific set of phonemes. There's no one-to-one correspondence between visemes and phonemes. Often, several phonemes correspond to a single viseme, because they look the same on the speaker's face when they're produced, such as `s` and `z`. For more specific information, see the table for [mapping phonemes to viseme IDs](#map-phonemes-to-visemes).
46
+
Viseme ID refers to an integer number that specifies a viseme. We offer 22 different visemes, each depicting the mouth position for a specific set of phonemes. There's no one-to-one correspondence between visemes and phonemes. Often, several phonemes correspond to a single viseme, because they look the same on the speaker's face when they're produced, such as `s` and `z`. For more specific information, see the table for [mapping phonemes to viseme IDs](#map-phonemes-to-visemes).
47
47
48
48
Speech audio output can be accompanied by viseme IDs and `Audio offset`. The `Audio offset` indicates the offset timestamp that represents the start time of each viseme, in ticks (100 nanoseconds).
49
49
50
50
### Map phonemes to visemes
51
51
52
-
Visemes vary by language and locale. Each locale has a set of visemes that correspond to its specific phonemes. The [SSML phonetic alphabets](speech-ssml-phonetic-sets.md) documentation maps viseme IDs to the corresponding International Phonetic Alphabet (IPA) phonemes.
52
+
Visemes vary by language and locale. Each locale has a set of visemes that correspond to its specific phonemes. The [SSML phonetic alphabets](speech-ssml-phonetic-sets.md) documentation maps viseme IDs to the corresponding International Phonetic Alphabet (IPA) phonemes. The table below shows a mapping relationship between viseme IDs and mouth positions, listing typical IPA phonemes for each viseme ID.
53
+
54
+
| Viseme ID | IPA | Mouth position|
55
+
|---------------|---------------|---------------|
56
+
|0|Silence|<imgsrc="media/text-to-speech/viseme-id-0.jpg"width="200"height="200"alt="The mouth position when viseme ID is 0">|
57
+
|1|`æ`, `ə`, `ʌ`| <imgsrc="media/text-to-speech/viseme-id-1.jpg"width="200"height="200"alt="The mouth position when viseme ID is 1">|
58
+
|2|`ɑ`|<imgsrc="media/text-to-speech/viseme-id-2.jpg"width="200"height="200"alt="The mouth position when viseme ID is 2">|
59
+
|3|`ɔ`|<imgsrc="media/text-to-speech/viseme-id-3.jpg"width="200"height="200"alt="The mouth position when viseme ID is 3">|
60
+
|4|`ɛ`, `ʊ`|<imgsrc="media/text-to-speech/viseme-id-4.jpg"width="200"height="200"alt="The mouth position when viseme ID is 4">|
61
+
|5|`ɝ`|<imgsrc="media/text-to-speech/viseme-id-5.jpg"width="200"height="200"alt="The mouth position when viseme ID is 5">|
62
+
|6|`j`, `i`, `ɪ`|<imgsrc="media/text-to-speech/viseme-id-6.jpg"width="200"height="200"alt="The mouth position when viseme ID is 6">|
63
+
|7|`w`, `u`|<imgsrc="media/text-to-speech/viseme-id-7.jpg"width="200"height="200"alt="The mouth position when viseme ID is 7">|
64
+
|8|`o`|<imgsrc="media/text-to-speech/viseme-id-8.jpg"width="200"height="200"alt="The mouth position when viseme ID is 8">|
65
+
|9|Not supported|<imgsrc="media/text-to-speech/viseme-id-9.jpg"width="200"height="200"alt="The mouth position when viseme ID is 9">|
66
+
|10|Not supported|<imgsrc="media/text-to-speech/viseme-id-10.jpg"width="200"height="200"alt="The mouth position when viseme ID is 10">|
67
+
|11|Not supported|<imgsrc="media/text-to-speech/viseme-id-11.jpg"width="200"height="200"alt="The mouth position when viseme ID is 11">|
68
+
|12|`h`|<imgsrc="media/text-to-speech/viseme-id-12.jpg"width="200"height="200"alt="The mouth position when viseme ID is 12">|
69
+
|13|`ɹ`|<imgsrc="media/text-to-speech/viseme-id-13.jpg"width="200"height="200"alt="The mouth position when viseme ID is 13">|
70
+
|14|`l`|<imgsrc="media/text-to-speech/viseme-id-14.jpg"width="200"height="200"alt="The mouth position when viseme ID is 14">|
71
+
|15|`s`, `z`|<imgsrc="media/text-to-speech/viseme-id-15.jpg"width="200"height="200"alt="The mouth position when viseme ID is 15">|
72
+
|16|`ʃ`, `tʃ`, `dʒ`, `ʒ`|<imgsrc="media/text-to-speech/viseme-id-16.jpg"width="200"height="200"alt="The mouth position when viseme ID is 16">|
73
+
|17|`ð`|<imgsrc="media/text-to-speech/viseme-id-17.jpg"width="200"height="200"alt="The mouth position when viseme ID is 17">|
74
+
|18|`f`, `v`|<imgsrc="media/text-to-speech/viseme-id-18.jpg"width="200"height="200"alt="The mouth position when viseme ID is 18">|
75
+
|19|`d`, `t`, `n`, `θ`|<imgsrc="media/text-to-speech/viseme-id-19.jpg"width="200"height="200"alt="The mouth position when viseme ID is 19">|
76
+
|20|`k`, `g`, `ŋ`|<imgsrc="media/text-to-speech/viseme-id-20.jpg"width="200"height="200"alt="The mouth position when viseme ID is 20">|
77
+
|21|`p`, `b`, `m`|<imgsrc="media/text-to-speech/viseme-id-21.jpg"width="200"height="200"alt="The mouth position when viseme ID is 21">|
0 commit comments