Skip to content

Commit c423602

Browse files
feat: [google-cloud-texttospeech] Support markup input for Cloud TTS Chirp 3: HD voice synthesis (googleapis#13875)
- [ ] Regenerate this pull request now. BEGIN_COMMIT_OVERRIDE feat: Support markup input for Cloud TTS Chirp 3: HD voice synthesis feat: Support pinyin/yomigana custom pronunciation encodings for cmn-cn/ja-jp END_COMMIT_OVERRIDE feat: Support pinyin/yomigana custom pronunciation encodings for cmn-cn/ja-jp PiperOrigin-RevId: 754921874 Source-Link: googleapis/googleapis@8f7ef1c Source-Link: https://github.com/googleapis/googleapis-gen/commit/0fe6000329d17eece491681acaf79478a33352af Copy-Tag: eyJwIjoicGFja2FnZXMvZ29vZ2xlLWNsb3VkLXRleHR0b3NwZWVjaC8uT3dsQm90LnlhbWwiLCJoIjoiMGZlNjAwMDMyOWQxN2VlY2U0OTE2ODFhY2FmNzk0NzhhMzMzNTJhZiJ9 --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
1 parent c63f830 commit c423602

File tree

1 file changed

+54
-0
lines changed
  • packages/google-cloud-texttospeech/google/cloud/texttospeech_v1/types

1 file changed

+54
-0
lines changed

packages/google-cloud-texttospeech/google/cloud/texttospeech_v1/types/cloud_tts.py

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -297,10 +297,39 @@ class PhoneticEncoding(proto.Enum):
297297
PHONETIC_ENCODING_X_SAMPA (2):
298298
X-SAMPA, such as apple -> "{p@l".
299299
https://en.wikipedia.org/wiki/X-SAMPA
300+
PHONETIC_ENCODING_JAPANESE_YOMIGANA (3):
301+
For reading-to-pron conversion to work well, the
302+
``pronunciation`` field should only contain Kanji, Hiragana,
303+
and Katakana.
304+
305+
The pronunciation can also contain pitch accents. The start
306+
of a pitch phrase is specified with ``^`` and the down-pitch
307+
position is specified with ``!``, for example:
308+
309+
::
310+
311+
phrase:端 pronunciation:^はし
312+
phrase:箸 pronunciation:^は!し
313+
phrase:橋 pronunciation:^はし!
314+
315+
We currently only support the Tokyo dialect, which allows at
316+
most one down-pitch per phrase (i.e. at most one ``!``
317+
between ``^``).
318+
PHONETIC_ENCODING_PINYIN (4):
319+
Used to specify pronunciations for Mandarin
320+
words. See https://en.wikipedia.org/wiki/Pinyin.
321+
322+
For example: 朝阳, the pronunciation is "chao2
323+
yang2". The number represents the tone, and
324+
there is a space between syllables. Neutral
325+
tones are represented by 5, for example 孩子 "hai2
326+
zi5".
300327
"""
301328
PHONETIC_ENCODING_UNSPECIFIED = 0
302329
PHONETIC_ENCODING_IPA = 1
303330
PHONETIC_ENCODING_X_SAMPA = 2
331+
PHONETIC_ENCODING_JAPANESE_YOMIGANA = 3
332+
PHONETIC_ENCODING_PINYIN = 4
304333

305334
phrase: str = proto.Field(
306335
proto.STRING,
@@ -388,6 +417,11 @@ class SynthesisInput(proto.Message):
388417
text (str):
389418
The raw text to be synthesized.
390419
420+
This field is a member of `oneof`_ ``input_source``.
421+
markup (str):
422+
Markup for HD voices specifically. This field
423+
may not be used with any other voices.
424+
391425
This field is a member of `oneof`_ ``input_source``.
392426
ssml (str):
393427
The SSML document to be synthesized. The SSML document must
@@ -424,6 +458,11 @@ class SynthesisInput(proto.Message):
424458
number=1,
425459
oneof="input_source",
426460
)
461+
markup: str = proto.Field(
462+
proto.STRING,
463+
number=5,
464+
oneof="input_source",
465+
)
427466
ssml: str = proto.Field(
428467
proto.STRING,
429468
number=2,
@@ -743,6 +782,11 @@ class StreamingSynthesizeConfig(proto.Message):
743782
class StreamingSynthesisInput(proto.Message):
744783
r"""Input to be synthesized.
745784
785+
This message has `oneof`_ fields (mutually exclusive fields).
786+
For each oneof, at most one member field can be set at the same time.
787+
Setting any member of the oneof automatically clears all other
788+
members.
789+
746790
.. _oneof: https://proto-plus-python.readthedocs.io/en/stable/fields.html#oneofs-mutually-exclusive-fields
747791
748792
Attributes:
@@ -752,6 +796,11 @@ class StreamingSynthesisInput(proto.Message):
752796
terminating sentences, which results in better
753797
prosody in the output audio.
754798
799+
This field is a member of `oneof`_ ``input_source``.
800+
markup (str):
801+
Markup for HD voices specifically. This field
802+
may not be used with any other voices.
803+
755804
This field is a member of `oneof`_ ``input_source``.
756805
"""
757806

@@ -760,6 +809,11 @@ class StreamingSynthesisInput(proto.Message):
760809
number=1,
761810
oneof="input_source",
762811
)
812+
markup: str = proto.Field(
813+
proto.STRING,
814+
number=5,
815+
oneof="input_source",
816+
)
763817

764818

765819
class StreamingSynthesizeRequest(proto.Message):

0 commit comments

Comments
 (0)