Skip to content

Commit b5eca77

Browse files
authored
Merge branch 'main' into valle_resume
2 parents 4ad5e5f + 5b71bcf commit b5eca77

File tree

6 files changed

+46
-8
lines changed

6 files changed

+46
-8
lines changed

config/base.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@
122122
"align_mel_duration": false
123123
},
124124
"train": {
125-
"ddp": true,
125+
"ddp": false,
126126
"random_seed": 970227,
127127
"batch_size": 16,
128128
"max_steps": 1000000,

config/fs2.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,7 @@
9393
},
9494
"train":{
9595
"batch_size": 16,
96+
"max_epoch": 100,
9697
"sort_sample": true,
9798
"drop_last": true,
9899
"group_size": 4,

egs/tts/FastSpeech2/README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,11 @@ sh egs/tts/FastSpeech2/run.sh --stage 2 --name [YourExptName]
8383

8484
## 4. Inference
8585

86+
### Pre-trained Fastspeech 2 and HiFi-GAN Download
87+
88+
We released a pre-trained Amphion [Fastspeech 2](https://huggingface.co/amphion/fastspeech2_ljspeech) model and [HiFi-GAN](https://huggingface.co/amphion/hifigan_ljspeech) trained on LJSpeech. So you can download the them and generate speech according to the following inference instruction.
89+
90+
8691
### Configuration
8792

8893
For inference, you need to specify the following configurations when running `run.sh`:
@@ -96,6 +101,8 @@ For inference, you need to specify the following configurations when running `ru
96101
| `--infer_dataset` | The dataset used for inference. | For LJSpeech dataset, the inference dataset would be `LJSpeech`. |
97102
| `--infer_testing_set` | The subset of the inference dataset used for inference, e.g., train, test, golden_test | For LJSpeech dataset, the testing set would be  "`test`" split from LJSpeech at the feature extraction, or "`golden_test`" cherry-picked from test set as template testing set. |
98103
| `--infer_text` | The text to be synthesized. | "`This is a clip of generated speech with the given text from a TTS model.`" |
104+
| `--vocoder_dir` | The directory for the vocoder. | "`ckpts/vocoder/hifigan_ljspeech`" |
105+
99106

100107
### Run
101108
For example, if you want to generate speech of all testing set split from LJSpeech, just run:
@@ -106,7 +113,8 @@ sh egs/tts/FastSpeech2/run.sh --stage 3 \
106113
--infer_output_dir ckpts/tts/[YourExptName]/result \
107114
--infer_mode "batch" \
108115
--infer_dataset "LJSpeech" \
109-
--infer_testing_set "test"
116+
--infer_testing_set "test" \
117+
--vocoder_dir ckpts/vocoder/hifigan_ljspeech/checkpoints
110118
```
111119

112120
Or, if you want to generate a single clip of speech from a given text, just run:
@@ -116,10 +124,28 @@ sh egs/tts/FastSpeech2/run.sh --stage 3 \
116124
--infer_expt_dir ckpts/tts/[YourExptName] \
117125
--infer_output_dir ckpts/tts/[YourExptName]/result \
118126
--infer_mode "single" \
119-
--infer_text "This is a clip of generated speech with the given text from a TTS model."
127+
--infer_text "This is a clip of generated speech with the given text from a TTS model." \
128+
--vocoder_dir ckpts/vocoder/hifigan_ljspeech
129+
```
130+
131+
### ISSUES and Solutions
132+
120133
```
134+
NotImplementedError: Using RTX 3090 or 4000 series doesn't support faster communication broadband via P2P or IB. Please set `NCCL_P2P_DISABLE="1"` and `NCCL_IB_DISABLE="1" or use `accelerate launch` which will do this automatically.
135+
2024-02-24 10:57:49 | INFO | torch.distributed.distributed_c10d | Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 2 nodes.
136+
```
137+
The error message is related to an incompatibility issue with the NVIDIA RTX 3090 or 4000 series GPUs when trying to use peer-to-peer (P2P) communication or InfiniBand (IB) for faster communication. This incompatibility arises within the PyTorch accelerate library, which facilitates distributed training and inference.
138+
139+
To fix this issue, before running your script, you can set the environment variables in your terminal:
140+
```
141+
export NCCL_P2P_DISABLE=1
142+
export NCCL_IB_DISABLE=1
143+
```
144+
145+
### Noted
146+
Extensive logging messages related to `torch._subclasses.fake_tensor` and `torch._dynamo.output_graph` may be observed during inference. Despite attempts to ignore these logs, no effective solution has been found. However, it does not impact the inference process.
147+
121148

122-
We will release a pre-trained FastSpeech2 model trained on LJSpeech. So you can download the pre-trained model and generate speech following the above inference instruction.
123149

124150

125151
```bibtex

egs/tts/FastSpeech2/exp_config.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,6 @@
1717
},
1818
"train": {
1919
"batch_size": 16,
20+
"max_epoch": 100,
2021
}
2122
}

egs/tts/FastSpeech2/run.sh

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ echo $mfa_dir
2121

2222
######## Parse the Given Parameters from the Commond ###########
2323
# options=$(getopt -o c:n:s --long gpu:,config:,infer_expt_dir:,infer_output_dir:,infer_source_file:,infer_source_audio_dir:,infer_target_speaker:,infer_key_shift:,infer_vocoder_dir:,name:,stage: -- "$@")
24-
options=$(getopt -o c:n:s --long gpu:,config:,infer_expt_dir:,infer_output_dir:,infer_mode:,infer_dataset:,infer_testing_set:,infer_text:,name:,stage: -- "$@")
24+
options=$(getopt -o c:n:s --long gpu:,config:,infer_expt_dir:,infer_output_dir:,infer_mode:,infer_dataset:,infer_testing_set:,infer_text:,name:,stage:,vocoder_dir: -- "$@")
2525
eval set -- "$options"
2626

2727
while true; do
@@ -47,6 +47,8 @@ while true; do
4747
--infer_testing_set) shift; infer_testing_set=$1 ; shift ;;
4848
# [Only for Inference] The text to be synthesized from. It is only used when the inference model is "single".
4949
--infer_text) shift; infer_text=$1 ; shift ;;
50+
# [Only for Inference] The output dir to the vocoder.
51+
--vocoder_dir) shift; vocoder_dir=$1 ; shift ;;
5052

5153
--) shift ; break ;;
5254
*) echo "Invalid option: $1" exit 1 ;;
@@ -104,6 +106,11 @@ if [ $running_stage -eq 3 ]; then
104106
if [ -z "$infer_output_dir" ]; then
105107
infer_output_dir="$expt_dir/result"
106108
fi
109+
110+
if [ -z "$vocoder_dir" ]; then
111+
echo "[Error] Please specify the vocoder directory to reconstruct waveform from mel spectrogram."
112+
exit 1
113+
fi
107114

108115
if [ -z "$infer_mode" ]; then
109116
echo "[Error] Please specify the inference mode, e.g., "batch", "single""
@@ -143,8 +150,6 @@ if [ $running_stage -eq 3 ]; then
143150
--testing_set $infer_testing_set \
144151
--text "$infer_text" \
145152
--log_level debug \
146-
--vocoder_dir /mntnfs/lee_data1/chenxi/processed_data/ljspeech/model_ckpt/hifigan/checkpoints
147-
148-
153+
--vocoder_dir $vocoder_dir
149154

150155
fi

egs/tts/VITS/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,11 @@ Here are some example scenarios to better understand how to use these arguments:
143143
144144
## 4. Inference
145145
146+
### Pre-trained Model Download
147+
148+
We released a pre-trained Amphion VITS model trained on LJSpeech. So you can download the pre-trained model [here](https://huggingface.co/amphion/vits-ljspeech) and generate speech according to the following inference instruction.
149+
150+
146151
### Configuration
147152
148153
For inference, you need to specify the following configurations when running `run.sh`:

0 commit comments

Comments
 (0)