Skip to content

Commit e9e5b97

Browse files
committed
Fix links in README and update model references in SHARED.md; clean up imports in utils_infer.py
1 parent 0acad7e commit e9e5b97

File tree

3 files changed

+9
-8
lines changed

3 files changed

+9
-8
lines changed

src/f5_tts/infer/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ Currently supported features:
2424
- Basic TTS with Chunk Inference
2525
- Multi-Style / Multi-Speaker Generation
2626
- Voice Chat powered by Qwen2.5-3B-Instruct
27-
- [Custom inference with more language support](src/f5_tts/infer/SHARED.md)
27+
- [Custom inference with more language support](SHARED.md)
2828

2929
The cli command `f5-tts_infer-gradio` equals to `python src/f5_tts/infer/infer_gradio.py`, which launches a Gradio APP (web interface) for inference.
3030

src/f5_tts/infer/SHARED.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -137,11 +137,11 @@ Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "
137137
#### F5-TTS Base @ ja @ Jmica
138138
|Model|🤗Hugging Face|Data (Hours)|Model License|
139139
|:---:|:------------:|:-----------:|:-------------:|
140-
|F5-TTS Base|[ckpt & vocab](https://huggingface.co/Jmica/F5TTS/tree/main/JA_25498980)|[Emilia 1.7k JA](https://huggingface.co/datasets/amphion/Emilia-Dataset/tree/fc71e07) & [Galgame Dataset 5.4k](https://huggingface.co/datasets/OOPPEENN/Galgame_Dataset)|cc-by-nc-4.0|
140+
|F5-TTS Base|[ckpt & vocab](https://huggingface.co/Jmica/F5TTS/tree/main/JA_21999120)|[Emilia 1.7k JA](https://huggingface.co/datasets/amphion/Emilia-Dataset/tree/fc71e07) & [Galgame Dataset 5.4k](https://huggingface.co/datasets/OOPPEENN/Galgame_Dataset)|cc-by-nc-4.0|
141141

142142
```bash
143-
Model: hf://Jmica/F5TTS/JA_25498980/model_25498980.pt
144-
Vocab: hf://Jmica/F5TTS/JA_25498980/vocab_updated.txt
143+
Model: hf://Jmica/F5TTS/JA_21999120/model_21999120.pt
144+
Vocab: hf://Jmica/F5TTS/JA_21999120/vocab_japanese.txt
145145
Config: {"dim": 1024, "depth": 22, "heads": 16, "ff_mult": 2, "text_dim": 512, "text_mask_padding": False, "conv_layers": 4, "pe_attn_head": 1}
146146
```
147147

src/f5_tts/infer/utils_infer.py

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
import torch
2222
import torchaudio
2323
import tqdm
24-
from huggingface_hub import snapshot_download, hf_hub_download
24+
from huggingface_hub import hf_hub_download
2525
from pydub import AudioSegment, silence
2626
from transformers import pipeline
2727
from vocos import Vocos
@@ -128,11 +128,12 @@ def load_vocoder(vocoder_name="vocos", is_local=False, local_path="", device=dev
128128
except ImportError:
129129
print("You need to follow the README to init submodule and change the BigVGAN source code.")
130130
if is_local:
131-
"""download from https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x/tree/main"""
131+
# download generator from https://huggingface.co/nvidia/bigvgan_v2_24khz_100band_256x/tree/main
132132
vocoder = bigvgan.BigVGAN.from_pretrained(local_path, use_cuda_kernel=False)
133133
else:
134-
local_path = snapshot_download(repo_id="nvidia/bigvgan_v2_24khz_100band_256x", cache_dir=hf_cache_dir)
135-
vocoder = bigvgan.BigVGAN.from_pretrained(local_path, use_cuda_kernel=False)
134+
vocoder = bigvgan.BigVGAN.from_pretrained(
135+
"nvidia/bigvgan_v2_24khz_100band_256x", use_cuda_kernel=False, cache_dir=hf_cache_dir
136+
)
136137

137138
vocoder.remove_weight_norm()
138139
vocoder = vocoder.eval().to(device)

0 commit comments

Comments
 (0)