Here are
63 public repositories
matching this topic...
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Updated
May 27, 2025
Python
Generate audiobooks from EPUBs, PDFs and text with synchronized captions.
Updated
Aug 25, 2025
Python
[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Updated
Aug 18, 2025
Python
A webui for different audio related Neural Networks
Updated
May 19, 2025
Python
A family of diffusion models for text-to-audio generation.
Updated
Jul 29, 2025
Python
StreamSpeech is an “All in One” seamless model for offline and simultaneous speech recognition, speech translation and speech synthesis.
Updated
Jun 29, 2025
Python
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching
Updated
Jul 29, 2025
Jupyter Notebook
PyTorch Implementation of Make-An-Audio (ICML'23) with a Text-to-Audio Generative Model
Updated
May 22, 2024
Python
OpenMusic: SOTA Text-to-music (TTM) Generation
Updated
Jun 26, 2025
Python
Implementation of NÜWA, state of the art attention network for text to video synthesis, in Pytorch
Updated
Jan 17, 2023
Python
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
Updated
Sep 8, 2025
Python
Mustango: Toward Controllable Text-to-Music Generation
Updated
Jun 2, 2025
Python
High-quality Text-to-Audio Generation with Efficient Diffusion Transformer
Updated
Jul 3, 2025
Python
AudioStory: Generating Long-Form Narrative Audio with Large Language Models
Updated
Sep 2, 2025
Jupyter Notebook
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
Updated
Mar 25, 2024
Jupyter Notebook
Word2Wave: a framework for generating short audio samples from a text prompt using WaveGAN and COALA.
Updated
Dec 13, 2021
Python
Subtitle to audio, generate audio from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.
Updated
Dec 14, 2023
Python
Pytorch implementation of SoundCTM
Updated
Mar 31, 2025
Python
PyTorch Implementation of Google Brain's WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Updated
Aug 3, 2021
Python
Improve this page
Add a description, image, and links to the
text-to-audio
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
text-to-audio
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.