You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* remove HeteronymClassificationModel
Signed-off-by: Jason <jasoli@nvidia.com>
* pylint
Signed-off-by: Jason <jasoli@nvidia.com>
---------
Signed-off-by: Jason <jasoli@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Copy file name to clipboardExpand all lines: docs/source/tts/g2p.rst
+1-109Lines changed: 1 addition & 109 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,8 +24,6 @@ The models can be trained using words or sentences as input.
24
24
If trained with sentence-level input, the models can handle out-of-vocabulary (OOV) and heteronyms along with unambiguous words in a single pass.
25
25
See :ref:`Sentence-level Dataset Preparation Pipeline <sentence_level_dataset_pipeline>` on how to label data for G2P model training.
26
26
27
-
Additionally, we support a purpose-built BERT-based classification model for heteronym disambiguation, see :ref:`this <bert_heteronym_cl>` for details.
28
-
29
27
Model Training, Evaluation and Inference
30
28
----------------------------------------
31
29
@@ -125,116 +123,10 @@ Finally, we mask-out OOV words with a special masking token, “<unk>” in the
125
123
Using this unknown token forces a G2P model to produce the same masking token as a phonetic representation during training. During inference, the model generates phoneme predictions for OOV words without emitting the masking token as long as this token is not included in the grapheme input.
126
124
127
125
128
-
129
-
.. _bert_heteronym_cl:
130
-
131
-
Purpose-built BERT-based classification model for heteronym disambiguation
HeteronymClassificationModel is a BERT-based :cite:`g2p--devlin2018bert` model represents a token classification model and can handle multiple heteronyms at once. The model takes a sentence as an input, and then for every word, it selects a heteronym option out of the available forms.
135
-
We mask irrelevant forms to disregard the model’s predictions for non-ambiguous words. E.g., given the input “The Poems are simple to read and easy to comprehend.” the model scores possible {READ_PRESENT and READ_PAST} options for the word “read”.
136
-
Possible heteronym forms are extracted from the WikipediaHomographData :cite:`g2p--gorman2018improving`.
137
-
138
-
The model expects input to be in `.json` manifest format, where is line contains at least the following fields:
139
-
140
-
.. code::
141
-
142
-
{"text_graphemes": "Oxygen is less able to diffuse into the blood, leading to hypoxia.", "start_end": [23, 30], "homograph_span": "diffuse", "word_id": "diffuse_vrb"}
143
-
144
-
Manifest fields:
145
-
146
-
* `text_graphemes` - input sentence
147
-
148
-
* `start_end` - beginning and end of the heteronym span in the input sentence
149
-
150
-
* `homograph_span` - heteronym word in the sentence
151
-
152
-
* `word_id` - heteronym label, e.g., word `diffuse` has the following possible labels: `diffuse_vrb` and `diffuse_adj`. See `https://github.com/google-research-datasets/WikipediaHomographData/blob/master/data/wordids.tsv <https://github.com/google-research-datasets/WikipediaHomographData/blob/master/data/wordids.tsv>`__ for more details.
153
-
154
-
To convert the WikipediaHomographData to `.json` format suitable for the HeteronymClassificationModel training, run:
155
-
156
-
.. code-block::
157
-
158
-
# WikipediaHomographData could be downloaded from `https://github.com/google-research-datasets/WikipediaHomographData <https://github.com/google-research-datasets/WikipediaHomographData>`__.
pretrained_model="<Path to .nemo file or pretrained model name from list_available_models()>" \
207
-
output_file="<Path to .json manifest to save prediction>"
208
-
209
-
Note, if the input manifest contains target "word_id", evaluation will be also performed. During inference, the model predicts heteronym `word_id` and saves predictions in `"pred_text"` field of the `output_file`:
210
-
211
-
.. code::
212
-
213
-
{"text_graphemes": "Oxygen is less able to diffuse into the blood, leading to hypoxia.", "pred_text": "diffuse_vrb", "start_end": [23, 30], "homograph_span": "diffuse", "word_id": "diffuse_vrb"}
214
-
215
-
To train a model with `Chinese Polyphones with Pinyin (CPP) <https://github.com/kakaobrain/g2pM/tree/master/data>`__ dataset, run:
G2P requires NeMo NLP and ASR collections installed. See `Installation instructions <https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html>`__ for more details.
129
+
G2P requires the NeMo ASR collection to be installed. See `Installation instructions <https://docs.nvidia.com/nemo-framework/user-guide/latest/installation.html>`__ for more details.
0 commit comments