23
23
#
24
24
# 2. Spectrogram generation
25
25
#
26
- # From the encoded text, a spectrogram is generated. We use ``Tacotron2``
26
+ # From the encoded text, a spectrogram is generated. We use the ``Tacotron2``
27
27
# model for this.
28
28
#
29
29
# 3. Time-domain conversion
30
30
#
31
31
# The last step is converting the spectrogram into the waveform. The
32
- # process to generate speech from spectrogram is also called Vocoder.
32
+ # process to generate speech from spectrogram is also called a Vocoder.
33
33
# In this tutorial, three different vocoders are used,
34
34
# :py:class:`~torchaudio.models.WaveRNN`,
35
35
# :py:class:`~torchaudio.transforms.GriffinLim`, and
90
90
# works.
91
91
#
92
92
# Since the pre-trained Tacotron2 model expects specific set of symbol
93
- # tables, the same functionalities available in ``torchaudio``. This
94
- # section is more for the explanation of the basis of encoding .
93
+ # tables, the same functionalities is available in ``torchaudio``. However,
94
+ # we will first manually implement the encoding to aid in understanding .
95
95
#
96
- # Firstly , we define the set of symbols. For example, we can use
96
+ # First , we define the set of symbols
97
97
# ``'_-!\'(),.:;? abcdefghijklmnopqrstuvwxyz'``. Then, we will map the
98
98
# each character of the input text into the index of the corresponding
99
- # symbol in the table.
100
- #
101
- # The following is an example of such processing. In the example, symbols
102
- # that are not in the table are ignored.
103
- #
99
+ # symbol in the table. Symbols that are not in the table are ignored.
104
100
105
101
symbols = "_-!'(),.:;? abcdefghijklmnopqrstuvwxyz"
106
102
look_up = {s : i for i , s in enumerate (symbols )}
@@ -118,8 +114,8 @@ def text_to_sequence(text):
118
114
119
115
######################################################################
120
116
# As mentioned in the above, the symbol table and indices must match
121
- # what the pretrained Tacotron2 model expects. ``torchaudio`` provides the
122
- # transform along with the pretrained model. For example, you can
117
+ # what the pretrained Tacotron2 model expects. ``torchaudio`` provides the same
118
+ # transform along with the pretrained model. You can
123
119
# instantiate and use such transform as follow.
124
120
#
125
121
@@ -133,12 +129,12 @@ def text_to_sequence(text):
133
129
134
130
135
131
######################################################################
136
- # The ``processor `` object takes either a text or list of texts as inputs.
132
+ # Note: The output of our manual encoding and the ``torchaudio `` ``text_processor`` output matches (meaning we correctly re-implemented what the library does internally). It takes either a text or list of texts as inputs.
137
133
# When a list of texts are provided, the returned ``lengths`` variable
138
134
# represents the valid length of each processed tokens in the output
139
135
# batch.
140
136
#
141
- # The intermediate representation can be retrieved as follow.
137
+ # The intermediate representation can be retrieved as follows:
142
138
#
143
139
144
140
print ([processor .tokens [i ] for i in processed [0 , : lengths [0 ]]])
@@ -152,7 +148,7 @@ def text_to_sequence(text):
152
148
# uses a symbol table based on phonemes and a G2P (Grapheme-to-Phoneme)
153
149
# model.
154
150
#
155
- # The detail of the G2P model is out of scope of this tutorial, we will
151
+ # The detail of the G2P model is out of the scope of this tutorial, we will
156
152
# just look at what the conversion looks like.
157
153
#
158
154
# Similar to the case of character-based encoding, the encoding process is
@@ -195,7 +191,7 @@ def text_to_sequence(text):
195
191
# encoded text. For the detail of the model, please refer to `the
196
192
# paper <https://arxiv.org/abs/1712.05884>`__.
197
193
#
198
- # It is easy to instantiate a Tacotron2 model with pretrained weight ,
194
+ # It is easy to instantiate a Tacotron2 model with pretrained weights ,
199
195
# however, note that the input to Tacotron2 models need to be processed
200
196
# by the matching text processor.
201
197
#
@@ -224,7 +220,7 @@ def text_to_sequence(text):
224
220
225
221
######################################################################
226
222
# Note that ``Tacotron2.infer`` method perfoms multinomial sampling,
227
- # therefor , the process of generating the spectrogram incurs randomness.
223
+ # therefore , the process of generating the spectrogram incurs randomness.
228
224
#
229
225
230
226
@@ -245,16 +241,16 @@ def plot():
245
241
# -------------------
246
242
#
247
243
# Once the spectrogram is generated, the last process is to recover the
248
- # waveform from the spectrogram.
244
+ # waveform from the spectrogram using a vocoder .
249
245
#
250
246
# ``torchaudio`` provides vocoders based on ``GriffinLim`` and
251
247
# ``WaveRNN``.
252
248
#
253
249
254
250
255
251
######################################################################
256
- # WaveRNN
257
- # ~~~~~~~
252
+ # WaveRNN Vocoder
253
+ # ~~~~~~~~~~~~~~~
258
254
#
259
255
# Continuing from the previous section, we can instantiate the matching
260
256
# WaveRNN model from the same bundle.
@@ -294,11 +290,11 @@ def plot(waveforms, spec, sample_rate):
294
290
295
291
296
292
######################################################################
297
- # Griffin-Lim
298
- # ~~~~~~~~~~~
293
+ # Griffin-Lim Vocoder
294
+ # ~~~~~~~~~~~~~~~~~~~
299
295
#
300
296
# Using the Griffin-Lim vocoder is same as WaveRNN. You can instantiate
301
- # the vocode object with
297
+ # the vocoder object with
302
298
# :py:func:`~torchaudio.pipelines.Tacotron2TTSBundle.get_vocoder`
303
299
# method and pass the spectrogram.
304
300
#
@@ -323,8 +319,8 @@ def plot(waveforms, spec, sample_rate):
323
319
324
320
325
321
######################################################################
326
- # Waveglow
327
- # ~~~~~~~~
322
+ # Waveglow Vocoder
323
+ # ~~~~~~~~~~~~~~~~
328
324
#
329
325
# Waveglow is a vocoder published by Nvidia. The pretrained weights are
330
326
# published on Torch Hub. One can instantiate the model using ``torch.hub``
0 commit comments