Skip to content

paulocoutinhox/audio-studio-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logo

Audio Studio AI 🎤

Audio Studio AI is an interactive application built with Python and Streamlit, allowing users to generate high-quality audio content locally using advanced text-to-speech technology. It features both a beautiful web interface and a powerful REST API for integration. Perfect for creating voiceovers for videos, podcasts, and other audio content.

🚀 Features

  • Local AI-powered text-to-speech with high-quality voice synthesis
  • Multiple language support with various voices for each language
  • Sentence-based audio generation with customizable pauses
  • Individual sentence preview and editing
  • Export/Import functionality for sentence configurations
  • Multiple output formats (WAV, MP3) with dynamic format support
  • Customizable speech speed for each sentence
  • Beautiful Streamlit web interface for easy interaction
  • REST API for integration with other applications
  • Local processing - no cloud dependencies required

📞 Installation

1. Clone the Repository

git clone https://github.com/paulocoutinhox/audio-studio-ai.git
cd audio-studio-ai

2. Create a Virtual Environment

python3 -m venv .venv
source .venv/bin/activate  # macOS/Linux
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install -r requirements.txt

4. Download Required Model Files

Download the required model files from the model repository and place them in the models/ directory of the project. See the Model Support section for details.

⚙️ Configuration

1. Model Files

The application expects the model files to be in the models/ directory. See the Model Support section for specific file requirements.

2. Output Configuration

You can configure the following settings in the sidebar:

  • Output format (WAV or MP3)
  • Minimum and maximum pause duration between sentences
  • Model and voices file paths

🛠️ Usage

Web Interface (Streamlit)

  1. Run the Streamlit Application

    streamlit run app.py
  2. Steps in the Web UI

    • Add sentences using the "Add Sentence" button
    • For each sentence:
      • Enter the text
      • Select the language
      • Choose a voice
      • Adjust the speech speed
    • Use the up/down arrows to reorder sentences
    • Delete sentences using the trash icon
    • Click "Generate Audio" to create the final audio
    • Preview individual sentences or download the complete audio

REST API

  1. Start the API Server

    python api.py
  2. API Documentation

    • Server runs on: http://localhost:8000
    • Interactive docs: http://localhost:8000/docs
    • 📚 Complete API Documentation - Examples, endpoints, and integration guides
  3. Quick API Example

    curl -X POST "http://localhost:8000/generate-audio" \
      -H "Content-Type: application/json" \
      -d '{
        "sentences": [{
          "text": "Hello world!",
          "lang": "en-us",
          "voice": "af_sarah",
          "speed": 1.0
        }],
        "output_format": "mp3"
      }'

🌍 Supported Languages and Voices

The application supports multiple languages with various voices for each:

  • American English (en-us)
    • Multiple voices including af_sarah, af_nova, af_river, and more
  • British English (en-gb)
    • Voices like bf_alice, bf_emma, bm_daniel, and more
  • Japanese (ja)
    • Voices including jf_alpha, jf_gongitsune, jm_kumo, and more
  • Mandarin Chinese (zh)
    • Multiple voices like zf_xiaobei, zf_xiaoxiao, zm_yunjian, and more
  • Spanish (es)
    • Voices including ef_dora, em_alex
  • French (fr)
    • Voice ff_siwis
  • Hindi (hi)
    • Voices including hf_alpha, hf_beta, hm_omega
  • Italian (it)
    • Voices if_sara, im_nicola
  • Brazilian Portuguese (pt-br)
    • Voices pf_dora, pm_alex, pm_santa

🤖 Model Support

This application currently supports the Kokoro TTS model for high-quality text-to-speech synthesis. To use the application:

  1. Download the following files from Hugging Face - Kokoro-82M:

    • kokoro-v1.0.onnx
    • voices-v1.0.bin
  2. Download the model files using these direct links:

  3. Place these files in the models/ directory of the project.

The Kokoro model provides:

  • High-quality voice synthesis
  • Support for multiple languages
  • Various voice options for each language
  • Fast local processing
  • No cloud dependencies

📁 Project Structure

audio-studio-ai/
│
├── 📝 README.md               # Project documentation and guide
├── 📚 API.md                  # Complete REST API documentation
├── 🎯 app.py                  # Main Streamlit application interface
├── 🚀 api.py                  # FastAPI REST API server
├── ⚙️ config.py               # Configuration and settings management
├── 🛠️ utils.py                # Utility functions and helpers
├── 📦 requirements.txt        # Project dependencies list
│
├── 🤖 models/                 # AI model files directory
│   ├── 🧠 kokoro-v1.0.onnx    # TTS neural network model
│   └── 🗣️ voices-v1.0.bin     # Voice data and configurations
│
├── 🎵 temp/                   # Temporary audio files storage
│
└── 🎨 extras/                 # Additional resources
    └── 🖼️ images/             # Images, icons and assets

🤝 Contributing

  1. Fork the repository
  2. Create a new branch (git checkout -b feature-xyz)
  3. Commit changes (git commit -m "Added new feature")
  4. Push to the branch (git push origin feature-xyz)
  5. Open a pull request

📞 Contact

For issues or contributions, open a GitHub issue or contact: 💎 paulocoutinhox@gmail.com 🔗 GitHub

📜 License

MIT

Copyright (c) 2025, Paulo Coutinho

About

Audio Studio AI is an interactive application built with Python and Streamlit, allowing users to generate high-quality audio content locally using advanced text-to-speech technology. It's perfect for creating voiceovers for videos, podcasts, and other audio content.

Resources

Stars

Watchers

Forks

Sponsor this project

  •  

Packages

 
 
 

Contributors

Languages