You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: egs/tts/FastSpeech2/README.md
+29-3Lines changed: 29 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -83,6 +83,11 @@ sh egs/tts/FastSpeech2/run.sh --stage 2 --name [YourExptName]
83
83
84
84
## 4. Inference
85
85
86
+
### Pre-trained Fastspeech 2 and HiFi-GAN Download
87
+
88
+
We released a pre-trained Amphion [Fastspeech 2](https://huggingface.co/amphion/fastspeech2_ljspeech) model and [HiFi-GAN](https://huggingface.co/amphion/hifigan_ljspeech) trained on LJSpeech. So you can download the them and generate speech according to the following inference instruction.
89
+
90
+
86
91
### Configuration
87
92
88
93
For inference, you need to specify the following configurations when running `run.sh`:
@@ -96,6 +101,8 @@ For inference, you need to specify the following configurations when running `ru
96
101
|`--infer_dataset`| The dataset used for inference. | For LJSpeech dataset, the inference dataset would be `LJSpeech`. |
97
102
|`--infer_testing_set`| The subset of the inference dataset used for inference, e.g., train, test, golden_test | For LJSpeech dataset, the testing set would be "`test`" split from LJSpeech at the feature extraction, or "`golden_test`" cherry-picked from test set as template testing set. |
98
103
|`--infer_text`| The text to be synthesized. | "`This is a clip of generated speech with the given text from a TTS model.`" |
104
+
|`--vocoder_dir`| The directory for the vocoder. | "`ckpts/vocoder/hifigan_ljspeech`" |
105
+
99
106
100
107
### Run
101
108
For example, if you want to generate speech of all testing set split from LJSpeech, just run:
@@ -106,7 +113,8 @@ sh egs/tts/FastSpeech2/run.sh --stage 3 \
--infer_text "This is a clip of generated speech with the given text from a TTS model."
127
+
--infer_text "This is a clip of generated speech with the given text from a TTS model." \
128
+
--vocoder_dir ckpts/vocoder/hifigan_ljspeech
129
+
```
130
+
131
+
### ISSUES and Solutions
132
+
120
133
```
134
+
NotImplementedError: Using RTX 3090 or 4000 series doesn't support faster communication broadband via P2P or IB. Please set `NCCL_P2P_DISABLE="1"` and `NCCL_IB_DISABLE="1" or use `accelerate launch` which will do this automatically.
135
+
2024-02-24 10:57:49 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
136
+
```
137
+
The error message is related to an incompatibility issue with the NVIDIA RTX 3090 or 4000 series GPUs when trying to use peer-to-peer (P2P) communication or InfiniBand (IB) for faster communication. This incompatibility arises within the PyTorch accelerate library, which facilitates distributed training and inference.
138
+
139
+
To fix this issue, before running your script, you can set the environment variables in your terminal:
140
+
```
141
+
export NCCL_P2P_DISABLE=1
142
+
export NCCL_IB_DISABLE=1
143
+
```
144
+
145
+
### Noted
146
+
Extensive logging messages related to `torch._subclasses.fake_tensor` and `torch._dynamo.output_graph` may be observed during inference. Despite attempts to ignore these logs, no effective solution has been found. However, it does not impact the inference process.
147
+
121
148
122
-
We will release a pre-trained FastSpeech2 model trained on LJSpeech. So you can download the pre-trained model and generate speech following the above inference instruction.
Copy file name to clipboardExpand all lines: egs/tts/VITS/README.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -143,6 +143,11 @@ Here are some example scenarios to better understand how to use these arguments:
143
143
144
144
## 4. Inference
145
145
146
+
### Pre-trained Model Download
147
+
148
+
We released a pre-trained Amphion VITS model trained on LJSpeech. So you can download the pre-trained model [here](https://huggingface.co/amphion/vits-ljspeech) and generate speech according to the following inference instruction.
149
+
150
+
146
151
### Configuration
147
152
148
153
For inference, you need to specify the following configurations when running `run.sh`:
0 commit comments