Skip to content

bharathgaddam1712/SyndicateSmashers

Repository files navigation

Syndicate Smashers

Prodigal.AI <> KodeKurrent
AI Voice Cloning Model Development Challenge

<a href="https://www.python.org/downloads/release/python-3120/">
<img src="https://img.shields.io/badge/Python-3.12+-orange" alt="Python">
PyTorch Torchaudio Transformers Gradio

Syndicate Smashers 🔥

Overview

Our Model is an advanced text-to-speech system that uses the power of large language models (LLM) for highly accurate and natural-sounding voice synthesis. It is designed to be efficient, flexible, and powerful for both research and production use.

Key Features

  • Simplicity and Efficiency: Built entirely on Qwen2.5, Syndicate Smasher Model eliminates the need for additional generation models like flow matching. Instead of relying on separate models to generate acoustic features, it directly reconstructs audio from the code predicted by the LLM, improving efficiency and reducing complexity.
  • High-Quality Voice Cloning: Supports zero-shot voice cloning, allowing it to replicate a speaker's voice without specific training data. This is ideal for cross-lingual and code-switching scenarios, ensuring seamless transitions between lvoices.
  • Controllable Speech Generation: Allows customization of gender, pitch, and speaking rate, making it easier to create virtual speakers.
  • User Authentication & Security:
    • Sign-up/Login System for secure access.

Inference Overview of Voice Cloning
Inference Overview of Controlled Generation

Install

Clone and Install

  • Clone the repo
git clone https://github.com/bharathgaddam1712/SyndicateSmashers.git
cd SYNDICATE_SMASHER
conda create -n venv -y python=3.12
conda venv
pip install -r requirements.txt

*Model Download

Download via python:

from huggingface_hub import snapshot_download

snapshot_download("SparkAudio/Spark-TTS-0.5B", local_dir="pretrained_models/Spark-TTS-0.5B")

Download via git clone:

mkdir -p pretrained_models

# Make sure you have git-lfs installed (https://git-lfs.com)
git lfs install

git clone https://huggingface.co/SparkAudio/Spark-TTS-0.5B pretrained_models/Spark-TTS-0.5B

Basic Usage

You can simply run the demo with the following commands:

cd example
bash infer.sh

Web UI Usage

You can start the UI interface by running python webui2.py --device 0, which allows you to perform Voice Cloning and Voice Creation. Voice Cloning supports uploading reference audio or directly recording the audio.

Voice Cloning Voice Creation
Image 1 Image 2

Utkarsh Raj

Bharath Gaddam

utkarsh.mp3

Bharath_Gaddam.mp3


Sunny Kumar

Shivam Jogdand

Sunny_Kumar.mp3

Shivam_Jogdand.mp3

🎥 Demo Video

Demo_Video (Replace with your actual demo video link)

👥 Team Details

Name Role
Utkarsh Raj Deep Learning
Bharath Gaddam AI Engineer
Sunny Kumar Machine Learning
Shivam Jogdand Machine Learning

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 4

  •  
  •  
  •  
  •