Audio Studio AI 🎤

Audio Studio AI is an interactive application built with Python and Streamlit, allowing users to generate high-quality audio content locally using advanced text-to-speech technology. It features both a beautiful web interface and a powerful REST API for integration. Perfect for creating voiceovers for videos, podcasts, and other audio content.

🚀 Features

Local AI-powered text-to-speech with high-quality voice synthesis
Multiple language support with various voices for each language
Sentence-based audio generation with customizable pauses
Individual sentence preview and editing
Export/Import functionality for sentence configurations
Multiple output formats (WAV, MP3) with dynamic format support
Customizable speech speed for each sentence
Beautiful Streamlit web interface for easy interaction
REST API for integration with other applications
Local processing - no cloud dependencies required

📞 Installation

1. Clone the Repository

git clone https://github.com/paulocoutinhox/audio-studio-ai.git
cd audio-studio-ai

2. Create a Virtual Environment

python3 -m venv .venv
source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Download Required Model Files

Download the required model files from the model repository and place them in the models/ directory of the project. See the Model Support section for details.

⚙️ Configuration

1. Model Files

The application expects the model files to be in the models/ directory. See the Model Support section for specific file requirements.

2. Output Configuration

You can configure the following settings in the sidebar:

Output format (WAV or MP3)
Minimum and maximum pause duration between sentences
Model and voices file paths

🛠️ Usage

Web Interface (Streamlit)

Run the Streamlit Application
```
streamlit run app.py
```
Steps in the Web UI
- Add sentences using the "Add Sentence" button
- For each sentence:
  - Enter the text
  - Select the language
  - Choose a voice
  - Adjust the speech speed
- Use the up/down arrows to reorder sentences
- Delete sentences using the trash icon
- Click "Generate Audio" to create the final audio
- Preview individual sentences or download the complete audio

REST API

Start the API Server
```
python api.py
```
API Documentation
- Server runs on: http://localhost:8000
- Interactive docs: http://localhost:8000/docs
- 📚 Complete API Documentation - Examples, endpoints, and integration guides

Quick API Example

curl -X POST "http://localhost:8000/generate-audio" \
  -H "Content-Type: application/json" \
  -d '{
    "sentences": [{
      "text": "Hello world!",
      "lang": "en-us",
      "voice": "af_sarah",
      "speed": 1.0
    }],
    "output_format": "mp3"
  }'

🌍 Supported Languages and Voices

The application supports multiple languages with various voices for each:

American English (en-us)
- Multiple voices including af_sarah, af_nova, af_river, and more
British English (en-gb)
- Voices like bf_alice, bf_emma, bm_daniel, and more
Japanese (ja)
- Voices including jf_alpha, jf_gongitsune, jm_kumo, and more
Mandarin Chinese (zh)
- Multiple voices like zf_xiaobei, zf_xiaoxiao, zm_yunjian, and more
Spanish (es)
- Voices including ef_dora, em_alex
French (fr)
- Voice ff_siwis
Hindi (hi)
- Voices including hf_alpha, hf_beta, hm_omega
Italian (it)
- Voices if_sara, im_nicola
Brazilian Portuguese (pt-br)
- Voices pf_dora, pm_alex, pm_santa

🤖 Model Support

This application currently supports the Kokoro TTS model for high-quality text-to-speech synthesis. To use the application:

Download the following files from Hugging Face - Kokoro-82M:
- kokoro-v1.0.onnx
- voices-v1.0.bin
Download the model files using these direct links:
- kokoro-v1.0.onnx
- voices-v1.0.bin
Place these files in the models/ directory of the project.

The Kokoro model provides:

High-quality voice synthesis
Support for multiple languages
Various voice options for each language
Fast local processing
No cloud dependencies

📁 Project Structure

audio-studio-ai/
│
├── 📝 README.md               # Project documentation and guide
├── 📚 API.md                  # Complete REST API documentation
├── 🎯 app.py                  # Main Streamlit application interface
├── 🚀 api.py                  # FastAPI REST API server
├── ⚙️ config.py               # Configuration and settings management
├── 🛠️ utils.py                # Utility functions and helpers
├── 📦 requirements.txt        # Project dependencies list
│
├── 🤖 models/                 # AI model files directory
│   ├── 🧠 kokoro-v1.0.onnx    # TTS neural network model
│   └── 🗣️ voices-v1.0.bin     # Voice data and configurations
│
├── 🎵 temp/                   # Temporary audio files storage
│
└── 🎨 extras/                 # Additional resources
    └── 🖼️ images/             # Images, icons and assets

🤝 Contributing

Fork the repository
Create a new branch (git checkout -b feature-xyz)
Commit changes (git commit -m "Added new feature")
Push to the branch (git push origin feature-xyz)
Open a pull request

📞 Contact

For issues or contributions, open a GitHub issue or contact: 💎 paulocoutinhox@gmail.com 🔗 GitHub

📜 License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Studio AI 🎤

🚀 Features

📞 Installation

1. Clone the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Download Required Model Files

⚙️ Configuration

1. Model Files

2. Output Configuration

🛠️ Usage

Web Interface (Streamlit)

REST API

🌍 Supported Languages and Voices

🤖 Model Support

📁 Project Structure

🤝 Contributing

📞 Contact

📜 License

About

Uh oh!

Releases 1

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
extras/images		extras/images
.editorconfig		.editorconfig
.gitignore		.gitignore
API.md		API.md
README.md		README.md
api.py		api.py
app.py		app.py
config.py		config.py
requirements.txt		requirements.txt
utils.py		utils.py

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Audio Studio AI 🎤

🚀 Features

📞 Installation

1. Clone the Repository

2. Create a Virtual Environment

3. Install Dependencies

4. Download Required Model Files

⚙️ Configuration

1. Model Files

2. Output Configuration

🛠️ Usage

Web Interface (Streamlit)

REST API

🌍 Supported Languages and Voices

🤖 Model Support

📁 Project Structure

🤝 Contributing

📞 Contact

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages