This fork has been modified to ensure smooth and stable inference on CPU-only machines. It addresses critical bugs that occur when running the original repository without a dedicated GPU.
Tested on:
- CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (2.11 GHz)
- RAM: 16 GB
This version includes specific fixes to enable stable CPU inference:
- Code Update: The inference script (
src/f5_tts/infer/utils_infer.py
) was modified to prevent a crash on startup (AttributeError: module 'torch' has no attribute 'xpu'
) when using a CPU-only PyTorch installation. - Dependency Update: The project's dependency file (
pyproject.toml
) was updated to resolve a low-level bug in newer PyTorch versions that caused anIndexError
during audio transcription on CPU.- PyTorch is now pinned to a known stable version:
torch==2.1.2
- The GPU-only package
bitsandbytes
is no longer installed, preventing installation failures
- PyTorch is now pinned to a known stable version:
These changes ensure that anyone can clone this repository and run it on a standard CPU without encountering the original errors.
F5-TTS: Diffusion Transformer with ConvNeXt V2, faster training and inference
E2 TTS: Flat-UNet Transformer, closest reproduction from paper
Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance
If you have Anaconda or Miniconda installed and conda
is available in your terminal:
- Download and extract this repository to
C:\F5-TTS-CPU_ONLY
- Open Anaconda Prompt or a Conda-enabled terminal
- Run:
install.bat
This script will:
- Create a new Conda environment
F5-TTS-CPU_ONLY
- Activate the environment
- Install the project in editable mode
Once installed, you can start the Gradio app by running:
launch.bat
This script will:
- Activate the correct environment
- Launch the Gradio-based TTS interface
# Create conda environment
conda create -n F5-TTS-CPU_ONLY python=3.10
conda activate F5-TTS-CPU_ONLY
# Navigate to the folder
cd C:\F5-TTS-CPU_ONLY
# Install the project
pip install -e .
f5-tts_infer-gradio
# Optional flags:
f5-tts_infer-gradio --port 7860 --host 0.0.0.0
f5-tts_infer-gradio --share
# Run with custom input
f5-tts_infer-cli --model F5TTS_v1_Base \
--ref_audio "prompt.wav" \
--ref_text "transcription of reference audio" \
--gen_text "Text you want the TTS model to generate."
# Use default config
f5-tts_infer-cli
# With custom TOML
f5-tts_infer-cli -c custom.toml
# Multi-voice/story config
f5-tts_infer-cli -c src/f5_tts/infer/examples/multi/story.toml
# Web UI-based fine-tuning
f5-tts_finetune-gradio
Or refer to the training guide for Accelerate-based workflows.
Use pre-commit
to automatically format and lint code:
pip install pre-commit
pre-commit install
pre-commit run --all-files
- E2-TTS brilliant work, simple and effective
- Emilia, WenetSpeech4TTS, LibriTTS, LJSpeech valuable datasets
- lucidrains initial CFM structure with also bfs18 for discussion
- SD3 & Hugging Face diffusers DiT and MMDiT code structure
- torchdiffeq as ODE solver, Vocos and BigVGAN as vocoder
- FunASR, faster-whisper, UniSpeech, SpeechMOS for evaluation tools
- ctc-forced-aligner for speech edit test
- mrfakename HuggingFace Space demo
- f5-tts-mlx implementation with MLX framework by Lucas Newman
- F5-TTS-ONNX ONNX Runtime version by DakeQQ
- Yuekai Zhang Triton and TensorRT-LLM support
If our work and codebase is useful for you, please cite:
@article{chen-etal-2024-f5tts,
title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
journal={arXiv preprint arXiv:2410.06885},
year={2024},
}
Our code is released under the MIT License.
The pre-trained models are licensed under the CC-BY-NC license due to the training data (Emilia), which is an in-the-wild dataset.
Sorry for any inconvenience this m