F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

CPU-Only Inference Fork with auto-installer and launcher by @Raxephion

This fork has been modified to ensure smooth and stable inference on CPU-only machines. It addresses critical bugs that occur when running the original repository without a dedicated GPU.

Tested on:

CPU: Intel(R) Core(TM) i5-10210U CPU @ 1.60GHz (2.11 GHz)
RAM: 16 GB

About This Fork and Key Changes

This version includes specific fixes to enable stable CPU inference:

Code Update: The inference script (src/f5_tts/infer/utils_infer.py) was modified to prevent a crash on startup (AttributeError: module 'torch' has no attribute 'xpu') when using a CPU-only PyTorch installation.
Dependency Update: The project's dependency file (pyproject.toml) was updated to resolve a low-level bug in newer PyTorch versions that caused an IndexError during audio transcription on CPU.
- PyTorch is now pinned to a known stable version: torch==2.1.2
- The GPU-only package bitsandbytes is no longer installed, preventing installation failures

These changes ensure that anyone can clone this repository and run it on a standard CPU without encountering the original errors.

F5-TTS: Diffusion Transformer with ConvNeXt V2, faster training and inference
E2 TTS: Flat-UNet Transformer, closest reproduction from paper
Sway Sampling: Inference-time flow step sampling strategy, greatly improves performance

Thanks to all the contributors!

Quick Start (Windows)

🔧 Installation

If you have Anaconda or Miniconda installed and conda is available in your terminal:

Download and extract this repository to C:\F5-TTS-CPU_ONLY
Open Anaconda Prompt or a Conda-enabled terminal
Run:

install.bat

This script will:

Create a new Conda environment F5-TTS-CPU_ONLY
Activate the environment
Install the project in editable mode

▶️ Launch the App

Once installed, you can start the Gradio app by running:

launch.bat

This script will:

Activate the correct environment
Launch the Gradio-based TTS interface

Manual Installation (if not using install.bat)

# Create conda environment
conda create -n F5-TTS-CPU_ONLY python=3.10
conda activate F5-TTS-CPU_ONLY

# Navigate to the folder
cd C:\F5-TTS-CPU_ONLY

# Install the project
pip install -e .

Inference (Gradio or CLI)

1. Gradio App

f5-tts_infer-gradio

# Optional flags:
f5-tts_infer-gradio --port 7860 --host 0.0.0.0
f5-tts_infer-gradio --share

2. CLI Inference

# Run with custom input
f5-tts_infer-cli --model F5TTS_v1_Base \
--ref_audio "prompt.wav" \
--ref_text "transcription of reference audio" \
--gen_text "Text you want the TTS model to generate."

# Use default config
f5-tts_infer-cli

# With custom TOML
f5-tts_infer-cli -c custom.toml

# Multi-voice/story config
f5-tts_infer-cli -c src/f5_tts/infer/examples/multi/story.toml

Training

# Web UI-based fine-tuning
f5-tts_finetune-gradio

Or refer to the training guide for Accelerate-based workflows.

Development & Code Quality

Use pre-commit to automatically format and lint code:

pip install pre-commit
pre-commit install
pre-commit run --all-files

Acknowledgements

E2-TTS brilliant work, simple and effective
Emilia, WenetSpeech4TTS, LibriTTS, LJSpeech valuable datasets
lucidrains initial CFM structure with also bfs18 for discussion
SD3 & Hugging Face diffusers DiT and MMDiT code structure
torchdiffeq as ODE solver, Vocos and BigVGAN as vocoder
FunASR, faster-whisper, UniSpeech, SpeechMOS for evaluation tools
ctc-forced-aligner for speech edit test
mrfakename HuggingFace Space demo
f5-tts-mlx implementation with MLX framework by Lucas Newman
F5-TTS-ONNX ONNX Runtime version by DakeQQ
Yuekai Zhang Triton and TensorRT-LLM support

Citation

If our work and codebase is useful for you, please cite:

@article{chen-etal-2024-f5tts,
  title={F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching},
  author={Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen},
  journal={arXiv preprint arXiv:2410.06885},
  year={2024},
}

License

Our code is released under the MIT License.
The pre-trained models are licensed under the CC-BY-NC license due to the training data (Emilia), which is an in-the-wild dataset.
Sorry for any inconvenience this m

Name		Name	Last commit message	Last commit date
Latest commit History 633 Commits
.github		.github
ckpts		ckpts
data		data
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.de.md		README.de.md
README.md		README.md
install.bat		install.bat
launch.bat		launch.bat
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

CPU-Only Inference Fork with auto-installer and launcher by @Raxephion

About This Fork and Key Changes

Thanks to all the contributors!

Quick Start (Windows)

🔧 Installation

▶️ Launch the App

Manual Installation (if not using install.bat)

Inference (Gradio or CLI)

1. Gradio App

2. CLI Inference

Training

Development & Code Quality

Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Raxephion/F5TTS-CPU_ONLY-WebUI

Folders and files

Latest commit

History

Repository files navigation

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

CPU-Only Inference Fork with auto-installer and launcher by @Raxephion

About This Fork and Key Changes

Thanks to all the contributors!

Quick Start (Windows)

🔧 Installation

▶️ Launch the App

Manual Installation (if not using install.bat)

Inference (Gradio or CLI)

1. Gradio App

2. CLI Inference

Training

Development & Code Quality

Acknowledgements

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages